[
https://issues.apache.org/jira/browse/BEAM-7709?focusedWorklogId=274369&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-274369
]
ASF GitHub Bot logged work on BEAM-7709:
----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Jul/19 20:38
Start Date: 09/Jul/19 20:38
Worklog Time Spent: 10m
Work Description: lostluck commented on issue #9015: [BEAM-7709] Re-use
node for explicit flattens
URL: https://github.com/apache/beam/pull/9015#issuecomment-509798376
Thanks!
Agreed. That would have been fine and wouldn't have triggered the bug I saw.
In particular, subsequent links were creating new *flatten* nodes which meant
there was no guarding each of them forwarding the startbundle request to a
single downstream DoFn instance. The Go SDK treats each PTransform Node as a
state machine to ensure the Bundle Lifecycle is followed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 274369)
Time Spent: 1h 10m (was: 1h)
> Flattening multiple outputs of a ParDoN fails
> ---------------------------------------------
>
> Key: BEAM-7709
> URL: https://issues.apache.org/jira/browse/BEAM-7709
> Project: Beam
> Issue Type: Bug
> Components: sdk-go
> Affects Versions: Not applicable
> Reporter: Robert Burke
> Assignee: Robert Burke
> Priority: Major
> Fix For: Not applicable
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> If a user does a beam.ParDoN for pardo > 2 and then passes one or more of
> the outputs to a flatten, then if the flatten occurs SDK side, it currently
> creates multiple flatten nodes, which then triggers the downstream pardo (the
> DoFn that consumes the Flatten's output) to be initialized multiple times for
> a single bundle.
> The fix is to pre-emptively populate the input links with the first created
> flatten, so subsequent tracings of the plan use the same flatten node the
> same way the Go direct runner does[1]. That would happen in the exec
> translate code.
> [[1]
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/direct/direct.go#L299|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/direct/direct.go#L299]
> [[2]
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/translate.go#L493|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/translate.go#L493]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)