[
https://issues.apache.org/jira/browse/BEAM-10308?focusedWorklogId=453689&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453689
]
ASF GitHub Bot logged work on BEAM-10308:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 01/Jul/20 22:28
Start Date: 01/Jul/20 22:28
Worklog Time Spent: 10m
Work Description: kennknowles commented on a change in pull request
#12067:
URL: https://github.com/apache/beam/pull/12067#discussion_r448646153
##########
File path: sdks/python/apache_beam/pipeline.py
##########
@@ -221,6 +221,14 @@ def __init__(self, runner=None, options=None, argv=None):
# then the transform will have to be cloned with a new label.
self.applied_labels = set() # type: Set[str]
+ # Create a context for assigning IDs to components. Ensures that any
+ # components that receive an ID during pipeline construction (for example
in
+ # ExternalTransform), will receive the same component ID when generating
the
+ # full pipeline proto.
+ from apache_beam.runners import pipeline_context
Review comment:
There's not actually a circular dependency of classes. You could move
this class in here and that would be fine.
##########
File path: sdks/python/apache_beam/runners/pipeline_context.py
##########
@@ -49,6 +50,32 @@
from apache_beam.coders.coder_impl import IterableStateWriter
+class ComponentIdContext(object):
Review comment:
There's a lot of overlap between the purpose of this and the purpose of
the `PipelineContext` but I can see how they are different. I see that the
pipeline only takes the context at `to_runner_api` time, and the context has
tweaks like `use_fake_coders` and `default_environment`. So when using xlang
expansion you can keep just the ids and throw away the contents that will be
generated according to these tweaks later.
##########
File path: sdks/python/apache_beam/transforms/external.py
##########
@@ -286,7 +286,8 @@ def expand(self, pvalueish):
pipeline = (
next(iter(self._inputs.values())).pipeline
if self._inputs else pvalueish.pipeline)
- context = pipeline_context.PipelineContext()
+ context = pipeline_context.PipelineContext(
+ component_id_context=pipeline._component_id_context)
Review comment:
Based on this use, maybe `_component_id_context` isn't private at all?
And maybe it is a `component_id_map` or some such. It isn't "context" in the
sense that it isn't "the stuff surrounding the thing you are interested in".
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 453689)
Time Spent: 3.5h (was: 3h 20m)
> Component id assignement is not consistent across PipelineContext instances
> ---------------------------------------------------------------------------
>
> Key: BEAM-10308
> URL: https://issues.apache.org/jira/browse/BEAM-10308
> Project: Beam
> Issue Type: Bug
> Components: cross-language, sdk-py-core
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: P1
> Time Spent: 3.5h
> Remaining Estimate: 0h
>
> The "unique ref" ids used in PipelineContext are generated on the fly, which
> can cause us to get a different id for the same component in different
> contexts.
> This becomes a problem when ExternalTransform is used, because it creates its
> own pipeline context for expansion. So its possible the component ids in the
> expansion request will actually refer to an entirely different component when
> the pipeline is finally assembled for execution.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)