[
https://issues.apache.org/jira/browse/BEAM-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16762315#comment-16762315
]
Ahmet Altay commented on BEAM-6584:
-----------------------------------
This problem is affecting all iterable side inputs. Somehow all side inputs
will be translated into:
Producer -> MapToVoidKeyX -> _DataflowIterableSideInput -> Consumer
Producer -> MapToVoidKeyX
MapToVoidKeyX appears in both cases.
I tracked the graph up to the call to the to_runner_api() and did not see a
malformed date. The issue happens here:
self.proto_pipeline, self.proto_context =
pipeline.to_runner_api(return_context=True)
(here =
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L345)]
> Python SDK creates job graphs with duplicated states when using fn_api
> execution mode.
> ---------------------------------------------------------------------------------------
>
> Key: BEAM-6584
> URL: https://issues.apache.org/jira/browse/BEAM-6584
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-harness
> Reporter: Valentyn Tymofieiev
> Priority: Major
>
> We observed this on apache_beam.examples.wordcount with Dataflow runner.
> The graph for this wordcount job contains two steps with the same name
> "write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1".
> {noformat}
> ...
> {
> "kind": "PAR_DO_KIND",
> "id": "s41",
> "name": "write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1",
> "displayData": [
> {
> "key": "fn",
> "namespace": "apache_beam.transforms.core.ParDo",
> "strValue": "apache_beam.transforms.core.CallableWrapperDoFn",
> "shortStrValue": "CallableWrapperDoFn",
> "label": "Transform Function"
> },
> {
> "key": "fn",
> "namespace": "apache_beam.transforms.core.CallableWrapperDoFn",
> "strValue": "\u003clambda\u003e",
> "label": "Transform Function"
> }
> ],
> "outputCollectionName": [
> "write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1.out0"
> ],
> "inputCollectionName": [
> "write/Write/WriteImpl/Extract.out0"
> ]
> },
> ...
> {
> "kind": "PAR_DO_KIND",
> "id": "s31",
> "name": "write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1",
> "displayData": [
> {
> "key": "fn",
> "namespace": "apache_beam.transforms.core.ParDo",
> "strValue": "apache_beam.transforms.core.CallableWrapperDoFn",
> "shortStrValue": "CallableWrapperDoFn",
> "label": "Transform Function"
> },
> {
> "key": "fn",
> "namespace": "apache_beam.transforms.core.CallableWrapperDoFn",
> "strValue": "\u003clambda\u003e",
> "label": "Transform Function"
> }
> ],
> "outputCollectionName": [
> "write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1.out0"
> ],
> "inputCollectionName": [
> "write/Write/WriteImpl/Extract.out0"
> ]
> },
> ...
> {noformat}
> CC: [~foegler] [~altay] [~robertwb]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)