[jira] [Work logged] (BEAM-10308) Component id assignement is not consistent across PipelineContext instances

ASF GitHub Bot (Jira) Wed, 01 Jul 2020 15:29:18 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-10308?focusedWorklogId=453689&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453689
 ]


ASF GitHub Bot logged work on BEAM-10308:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Jul/20 22:28
            Start Date: 01/Jul/20 22:28
    Worklog Time Spent: 10m 
      Work Description: kennknowles commented on a change in pull request 
#12067:
URL: https://github.com/apache/beam/pull/12067#discussion_r448646153



##########
File path: sdks/python/apache_beam/pipeline.py
##########
@@ -221,6 +221,14 @@ def __init__(self, runner=None, options=None, argv=None):
     # then the transform will have to be cloned with a new label.
     self.applied_labels = set()  # type: Set[str]
 
+    # Create a context for assigning IDs to components. Ensures that any
+    # components that receive an ID during pipeline construction (for example 
in
+    # ExternalTransform), will receive the same component ID when generating 
the
+    # full pipeline proto.
+    from apache_beam.runners import pipeline_context

Review comment:
       There's not actually a circular dependency of classes. You could move 
this class in here and that would be fine.

##########
File path: sdks/python/apache_beam/runners/pipeline_context.py
##########
@@ -49,6 +50,32 @@
   from apache_beam.coders.coder_impl import IterableStateWriter
 
 
+class ComponentIdContext(object):

Review comment:
       There's a lot of overlap between the purpose of this and the purpose of 
the `PipelineContext` but I can see how they are different. I see that the 
pipeline only takes the context at `to_runner_api` time, and the context has 
tweaks like `use_fake_coders` and `default_environment`. So when using xlang 
expansion you can keep just the ids and throw away the contents that will be 
generated according to these tweaks later.

##########
File path: sdks/python/apache_beam/transforms/external.py
##########
@@ -286,7 +286,8 @@ def expand(self, pvalueish):
     pipeline = (
         next(iter(self._inputs.values())).pipeline
         if self._inputs else pvalueish.pipeline)
-    context = pipeline_context.PipelineContext()
+    context = pipeline_context.PipelineContext(
+        component_id_context=pipeline._component_id_context)

Review comment:
       Based on this use, maybe `_component_id_context` isn't private at all? 
And maybe it is a `component_id_map` or some such. It isn't "context" in the 
sense that it isn't "the stuff surrounding the thing you are interested in".




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 453689)
    Time Spent: 3.5h  (was: 3h 20m)

> Component id assignement is not consistent across PipelineContext instances
> ---------------------------------------------------------------------------
>
>                 Key: BEAM-10308
>                 URL: https://issues.apache.org/jira/browse/BEAM-10308
>             Project: Beam
>          Issue Type: Bug
>          Components: cross-language, sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>            Priority: P1
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The "unique ref" ids used in PipelineContext are generated on the fly, which 
> can cause us to get a different id for the same component in different 
> contexts.
> This becomes a problem when ExternalTransform is used, because it creates its 
> own pipeline context for expansion. So its possible the component ids in the 
> expansion request will actually refer to an entirely different component when 
> the pipeline is finally assembled for execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-10308) Component id assignement is not consistent across PipelineContext instances

Reply via email to