[
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=408070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408070
]
ASF GitHub Bot logged work on BEAM-8019:
----------------------------------------
Author: ASF GitHub Bot
Created on: 23/Mar/20 17:19
Start Date: 23/Mar/20 17:19
Worklog Time Spent: 10m
Work Description: chamikaramj commented on pull request #11185:
[BEAM-8019] Some generalizations to support cross-language transforms.
URL: https://github.com/apache/beam/pull/11185#discussion_r396620882
##########
File path: sdks/python/apache_beam/pipeline.py
##########
@@ -1127,30 +1133,79 @@ def transform_to_runner_api(transform, # type:
Optional[ptransform.PTransform]
def from_runner_api(proto, # type: beam_runner_api_pb2.PTransform
context # type: PipelineContext
):
+ side_input_tags = []
+ if common_urns.primitives.PAR_DO.urn == proto.spec.urn:
+ # Preserving side input tags.
+ from apache_beam.utils import proto_utils
+ from apache_beam.portability.api import beam_runner_api_pb2
+ payload = (
+ proto_utils.parse_Bytes(
+ proto.spec.payload, beam_runner_api_pb2.ParDoPayload))
+ for tag, si in payload.side_inputs.items():
+ side_input_tags.append(tag)
+
# type: (...) -> AppliedPTransform
- def is_side_input(tag):
- # type: (str) -> bool
+ def is_python_side_input(tag):
# As per named_inputs() above.
- return tag.startswith('side')
+ return re.match(SIDE_INPUT_REGEX, tag)
+
+ all_input_tags = [tag for tag, id in proto.inputs.items()]
+
+ # All side inputs have to be available in input tags
+ python_indexed_side_inputs = False
+ for side_tag in side_input_tags:
+ if side_tag not in all_input_tags:
+ raise Exception(
+ 'Side input tag %s is not available in list of input tags %r' %
+ (side_tag, all_input_tags))
+
+ # We process Python and external side inputs differently. We fail early
+ # here if we cannot decide which way to go.
+ if is_python_side_input(side_tag):
+ python_indexed_side_inputs = True
+ else:
+ if python_indexed_side_inputs:
+ raise Exception(
+ 'Cannot process side inputs due to inconsistent sideinput '
+ 'naming. If using an external transform consider re-naming side '
+ 'inputs to not match Python indexed format %s' %
+ SIDE_INPUT_REGEX)
main_inputs = [
context.pcollections.get_by_id(id) for tag,
- id in proto.inputs.items() if not is_side_input(tag)
+ id in proto.inputs.items() if tag not in side_input_tags
]
- # Ordering is important here.
- indexed_side_inputs = [
- (get_sideinput_index(tag), context.pcollections.get_by_id(id)) for tag,
- id in proto.inputs.items() if is_side_input(tag)
- ]
- side_inputs = [si for _, si in sorted(indexed_side_inputs)]
+ if python_indexed_side_inputs:
+ # Ordering is important here.
Review comment:
I slightly reverted some of the code here to preserve the old behavior for
Python. New changes are basically to preserve input tags for remote SDKs
instead of letting Python override tags (which breaks Java).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 408070)
Time Spent: 8h 50m (was: 8h 40m)
> Support cross-language transforms for DataflowRunner
> ----------------------------------------------------
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
> Issue Type: New Feature
> Components: sdk-py-core
> Reporter: Chamikara Madhusanka Jayalath
> Assignee: Chamikara Madhusanka Jayalath
> Priority: Major
> Time Spent: 8h 50m
> Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)