[
https://issues.apache.org/jira/browse/BEAM-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760386#comment-16760386
]
Valentyn Tymofieiev commented on BEAM-6584:
-------------------------------------------
Step names in the graph without beam_fn_api experiment:
{noformat}
"name": "count",
"name": "format",
"name": "group",
"name": "pair_with_one",
"name": "read/Read",
"name": "split",
"name": "write/Write/WriteImpl/DoOnce/Read",
"name": "write/Write/WriteImpl/Extract",
"name": "write/Write/WriteImpl/FinalizeWrite/FinalizeWrite",
"name":
"write/Write/WriteImpl/FinalizeWrite/_UnpickledSideInput(Extract.out.0)",
"name":
"write/Write/WriteImpl/FinalizeWrite/_UnpickledSideInput(InitializeWrite.out.0)",
"name":
"write/Write/WriteImpl/FinalizeWrite/_UnpickledSideInput(PreFinalize.out.0)",
"name": "write/Write/WriteImpl/GroupByKey",
"name": "write/Write/WriteImpl/InitializeWrite",
"name": "write/Write/WriteImpl/Pair",
"name": "write/Write/WriteImpl/PreFinalize/PreFinalize",
"name":
"write/Write/WriteImpl/PreFinalize/_UnpickledSideInput(Extract.out.0)",
"name":
"write/Write/WriteImpl/PreFinalize/_UnpickledSideInput(InitializeWrite.out.0)",
"name": "write/Write/WriteImpl/WindowInto(WindowIntoFn)",
"name":
"write/Write/WriteImpl/WriteBundles/_UnpickledSideInput(InitializeWrite.out.0)",
"name": "write/Write/WriteImpl/WriteBundles/WriteBundles",
{noformat}
Step names in the graph with beam_fn_api experiment:
{noformat}
"name": "count",
"name": "format",
"name": "group",
"name": "pair_with_one",
"name": "read/read/impulse",
"name": "read/read/readsplits",
"name": "read/read/reshuffle/addrandomkeys",
"name": "read/read/reshuffle/removerandomkeys",
"name":
"read/read/reshuffle/reshuffleperkey/flatmap(restore_timestamps)",
"name": "read/read/reshuffle/reshuffleperkey/groupbykey",
"name": "read/read/reshuffle/reshuffleperkey/map(reify_timestamps)",
"name": "read/read/split",
"name": "split",
"name": "write/write/writeimpl/doonce/flatmap(\u003clambda at
core.py:2113\u003e)",
"name": "write/write/writeimpl/doonce/impulse",
"name": "write/write/writeimpl/doonce/map(decode)",
"name": "write/write/writeimpl/extract",
"name":
"write/write/writeimpl/finalizewrite/_dataflowiterablesideinput(maptovoidkey0.out.0)",
"name":
"write/write/writeimpl/finalizewrite/_dataflowiterablesideinput(maptovoidkey1.out.0)",
"name":
"write/write/writeimpl/finalizewrite/_dataflowiterablesideinput(maptovoidkey2.out.0)",
"name": "write/write/writeimpl/finalizewrite/finalizewrite",
"name": "write/write/writeimpl/finalizewrite/maptovoidkey0",
"name": "write/write/writeimpl/finalizewrite/maptovoidkey0",
"name": "write/write/writeimpl/finalizewrite/maptovoidkey1",
"name": "write/write/writeimpl/finalizewrite/maptovoidkey1",
"name": "write/write/writeimpl/finalizewrite/maptovoidkey2",
"name": "write/write/writeimpl/finalizewrite/maptovoidkey2",
"name": "write/write/writeimpl/groupbykey",
"name": "write/write/writeimpl/initializewrite",
"name": "write/write/writeimpl/pair",
"name":
"write/write/writeimpl/prefinalize/_dataflowiterablesideinput(maptovoidkey0.out.0)",
"name":
"write/write/writeimpl/prefinalize/_dataflowiterablesideinput(maptovoidkey1.out.0)",
"name": "write/write/writeimpl/prefinalize/maptovoidkey0",
"name": "write/write/writeimpl/prefinalize/maptovoidkey0",
"name": "write/write/writeimpl/prefinalize/maptovoidkey1",
"name": "write/write/writeimpl/prefinalize/maptovoidkey1",
"name": "write/write/writeimpl/prefinalize/prefinalize",
"name": "write/write/writeimpl/windowinto(windowintofn)",
"name":
"write/write/writeimpl/writebundles/_dataflowiterablesideinput(maptovoidkey0.out.0)",
"name": "write/write/writeimpl/writebundles/maptovoidkey0",
"name": "write/write/writeimpl/writebundles/maptovoidkey0",
"name": "write/write/writeimpl/writebundles/writebundles",
{noformat}
> Python SDK creates job graphs with duplicated states when using fn_api
> execution mode.
> ---------------------------------------------------------------------------------------
>
> Key: BEAM-6584
> URL: https://issues.apache.org/jira/browse/BEAM-6584
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-harness
> Reporter: Valentyn Tymofieiev
> Priority: Major
>
> We observed this on apache_beam.examples.wordcount with Dataflow runner.
> The graph for this wordcount job contains two steps with the same name
> "write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1".
> {noformat}
> ...
> {
> "kind": "PAR_DO_KIND",
> "id": "s41",
> "name": "write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1",
> "displayData": [
> {
> "key": "fn",
> "namespace": "apache_beam.transforms.core.ParDo",
> "strValue": "apache_beam.transforms.core.CallableWrapperDoFn",
> "shortStrValue": "CallableWrapperDoFn",
> "label": "Transform Function"
> },
> {
> "key": "fn",
> "namespace": "apache_beam.transforms.core.CallableWrapperDoFn",
> "strValue": "\u003clambda\u003e",
> "label": "Transform Function"
> }
> ],
> "outputCollectionName": [
> "write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1.out0"
> ],
> "inputCollectionName": [
> "write/Write/WriteImpl/Extract.out0"
> ]
> },
> ...
> {
> "kind": "PAR_DO_KIND",
> "id": "s31",
> "name": "write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1",
> "displayData": [
> {
> "key": "fn",
> "namespace": "apache_beam.transforms.core.ParDo",
> "strValue": "apache_beam.transforms.core.CallableWrapperDoFn",
> "shortStrValue": "CallableWrapperDoFn",
> "label": "Transform Function"
> },
> {
> "key": "fn",
> "namespace": "apache_beam.transforms.core.CallableWrapperDoFn",
> "strValue": "\u003clambda\u003e",
> "label": "Transform Function"
> }
> ],
> "outputCollectionName": [
> "write/Write/WriteImpl/FinalizeWrite/MapToVoidKey1.out0"
> ],
> "inputCollectionName": [
> "write/Write/WriteImpl/Extract.out0"
> ]
> },
> ...
> {noformat}
> CC: [~foegler] [~altay] [~robertwb]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)