[
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=415892&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415892
]
ASF GitHub Bot logged work on BEAM-8019:
----------------------------------------
Author: ASF GitHub Bot
Created on: 04/Apr/20 02:06
Start Date: 04/Apr/20 02:06
Worklog Time Spent: 10m
Work Description: chamikaramj commented on pull request #11185:
[BEAM-8019] Updates Python SDK to handle remote SDK coders and preserve tags
added by remote SDKs and propagate restriction coders.
URL: https://github.com/apache/beam/pull/11185#discussion_r403351916
##########
File path: sdks/python/apache_beam/pipeline.py
##########
@@ -1133,29 +1141,67 @@ def from_runner_api(proto, # type:
beam_runner_api_pb2.PTransform
context # type: PipelineContext
):
# type: (...) -> AppliedPTransform
- def is_side_input(tag):
- # type: (str) -> bool
- # As per named_inputs() above.
- return tag.startswith('side')
+
+ if common_urns.primitives.PAR_DO.urn == proto.spec.urn:
+ # Preserving side input tags.
+ from apache_beam.portability.api import beam_runner_api_pb2
+ payload = (
+ proto_utils.parse_Bytes(
+ proto.spec.payload, beam_runner_api_pb2.ParDoPayload))
+ side_input_tags = list(payload.side_inputs.keys())
+ else:
+ side_input_tags = []
main_inputs = [
context.pcollections.get_by_id(id) for tag,
- id in proto.inputs.items() if not is_side_input(tag)
+ id in proto.inputs.items() if tag not in side_input_tags
]
- # Ordering is important here.
- indexed_side_inputs = [
- (get_sideinput_index(tag), context.pcollections.get_by_id(id)) for tag,
- id in proto.inputs.items() if is_side_input(tag)
- ]
- side_inputs = [si for _, si in sorted(indexed_side_inputs)]
+ def is_python_side_input(tag):
+ # type: (str) -> bool
+ # As per named_inputs() above.
+ return re.match(SIDE_INPUT_REGEX, tag)
+
+ uses_python_sideinput_tags = (
+ is_python_side_input(side_input_tags[0]) if side_input_tags else False)
+
+ if uses_python_sideinput_tags:
+ # Ordering is important here.
+ # TODO(BEAM-9635): use key, value pairs instead of depending on tags with
+ # index as a suffix.
+ indexed_side_inputs = [
+ (get_sideinput_index(tag), context.pcollections.get_by_id(id))
+ for tag,
+ id in proto.inputs.items() if tag in side_input_tags
+ ]
+ side_inputs = [si for _, si in sorted(indexed_side_inputs)]
+ else:
+ side_inputs = [
+ context.pcollections.get_by_id(id) for tag,
+ id in proto.inputs.items() if tag in side_input_tags
+ ]
+
transform = ptransform.PTransform.from_runner_api(proto, context)
+ if isinstance(transform, RunnerAPIPTransformHolder):
+ # For external transforms that are ParDos, we have to set side-inputs
+ # manually and preserve input tags.
+ transform.side_inputs = [pvalue.AsMultiMap(pc) for pc in side_inputs]
+ input_tags_to_preserve = {
+ context.pcollections.get_by_id(id): tag
+ for tag,
+ id in proto.inputs.items()
+ }
+ else:
Review comment:
That's done in respective from_runner_api methods. For example,
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py#L1361
But RunnerAPIPTransformHolder is a holder type that is constructed directly
so setting it here seemed like the best option (we may be able to move this to
constructor but that would require passing in additional parameters to get
access to inputs etc. since using Python only
pvalue.SideInputData.from_runner_api is not an option).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 415892)
Time Spent: 15h 50m (was: 15h 40m)
> Support cross-language transforms for DataflowRunner
> ----------------------------------------------------
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
> Issue Type: New Feature
> Components: sdk-py-core
> Reporter: Chamikara Madhusanka Jayalath
> Assignee: Chamikara Madhusanka Jayalath
> Priority: Major
> Time Spent: 15h 50m
> Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)