[
https://issues.apache.org/jira/browse/CRUNCH-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriel Reid updated CRUNCH-88:
-------------------------------
Attachment: CRUNCH-88.patch
Patch which resolves the issue, including an integration test that demonstrates
the issue.
I had hoped to correct this in the planner, but that is non-trivial because
dependencies between PCollectionImpls are set up during the building of the
pipeline, and it appears that fixing this in the planner requires reworking the
structure of the pipeline graph quite a bit. However, I think that this would
be a good candidate once we start working on doing fusion optimizations in the
planner.
[~jwills] can you take a look at this? If you see a quick way to correct this
in the planner instead then that would be better, but I couldn't spot it.
> Multiple parallelDos on a PGroupedTableImpl does not work
> ---------------------------------------------------------
>
> Key: CRUNCH-88
> URL: https://issues.apache.org/jira/browse/CRUNCH-88
> Project: Crunch
> Issue Type: Bug
> Affects Versions: 0.3.0
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Attachments: CRUNCH-88.patch
>
>
> Creating multiple distinct PCollections based on a single PGroupedTableImpl
> does not work correctly - the content of the PGroupedTableImpl will only be
> sent to a single outgoing PCollection, and all other PCollections that stem
> from the grouped table will not receive any data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira