[
https://issues.apache.org/jira/browse/CRUNCH-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ioan Marius Curelariu updated CRUNCH-390:
-----------------------------------------
Attachment: 0001-Patched-the-MSCRPlanner-to-correctly-add-dependencie.patch
Added a patch that fixed the dependencies between jobs when planning is done in
more than one stage.
The patch also adds and integration test that demonstrates the issue and its
fixing.
I've successfully applied it back to an freshly cloned repository.
Can you please review my change?
Thank you.
> Planner is not adding dependencies between jobs when planning is done in more
> than one stage.
> ---------------------------------------------------------------------------------------------
>
> Key: CRUNCH-390
> URL: https://issues.apache.org/jira/browse/CRUNCH-390
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.8.2
> Reporter: Ioan Marius Curelariu
> Assignee: Josh Wills
> Attachments:
> 0001-Patched-the-MSCRPlanner-to-correctly-add-dependencie.patch
>
>
> The planner splits does the planning in multiple stages when it finds job
> dependencies on ReadableData. One example of this case is when using the
> BloomFilterJoinStrategy.
> While the generated plan dot file looks good, the planner actually does not
> add dependencies between jobs that are created in different planning stages.
> I have a pipeline that reads 3 input sources. It joins 2 of them using a
> bloom filter join strategy. Later on, it joins this with the output of a job
> coming from the third source path.
> In the case the jobs on the branch using the bloom filter finish before the
> one reading the third source, the executor attempts to start the 4-th job
> that is supposed to join everything before the 3-rd one finish, resulting in
> a input Path not found exception.
--
This message was sent by Atlassian JIRA
(v6.2#6252)