TheNeuralBit commented on pull request #12067:
URL: https://github.com/apache/beam/pull/12067#issuecomment-649046980


   I suppose another place where global state is ill-advised is when running 
tests, since we run all the tests in the same process and many of them create 
pipelines. 
   
   Python precommits are failing because of this. It looks like the failing 
tests are making assertions about specific values of component keys which seems 
brittle, for example:
   
   ```
   self = 
<apache_beam.runners.interactive.pipeline_instrument_test.PipelineInstrumentTest
 testMethod=test_cacheable_key_with_version_map>
   
       def test_cacheable_key_with_version_map(self):
         p = beam.Pipeline(interactive_runner.InteractiveRunner())
         # pylint: disable=range-builtin-not-iterating
         init_pcoll = p | 'Init Create' >> beam.Create(range(10))
       
         # It's normal that when executing, the pipeline object is a different
         # but equivalent instance from what user has built. The pipeline 
instrument
         # should be able to identify if the original instance has changed in an
         # interactive env while mutating the other instance for execution. The
         # version map can be used to figure out what the PCollection instances 
are
         # in the original instance and if the evaluation has changed since last
         # execution.
         p2 = beam.Pipeline(interactive_runner.InteractiveRunner())
         # pylint: disable=range-builtin-not-iterating
         init_pcoll_2 = p2 | 'Init Create' >> beam.Create(range(10))
         _, ctx = p2.to_runner_api(use_fake_coders=True, return_context=True)
       
         # The cacheable_key should use id(init_pcoll) as prefix even when
         # init_pcoll_2 is supplied as long as the version map is given.
         self.assertEqual(
             instr.cacheable_key(
                 init_pcoll_2,
                 instr.pcolls_to_pcoll_id(p2, ctx),
                 {'ref_PCollection_PCollection_8': str(id(init_pcoll))}),
   >         str(id(init_pcoll)) + '_ref_PCollection_PCollection_8')
   E     AssertionError: '140176476148624_ref_PCollection_PCollection_4539' != 
'140176476499024_ref_PCollection_PCollection_8'
   E     - 140176476148624_ref_PCollection_PCollection_4539
   E     ?          - ^^                               ^^^^
   E     + 140176476499024_ref_PCollection_PCollection_8
   E     ?           ^^^      
   ```
   
   Its probably easier to do the work to make sure cached component ids are 
scoped to an individual pipeline rather than fixing all of these tests (and a 
global cache shared across tests will be problematic anyway).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to