Pig needs a more efficient DAG execution
----------------------------------------
Key: PIG-1734
URL: https://issues.apache.org/jira/browse/PIG-1734
Project: Pig
Issue Type: Improvement
Reporter: Olga Natkovich
The current code uses Hadoop's Job control to execute one stage at a time. The
first stage includes all jobs with no dependencies, the second stage jobs that
depend only on jobs completed in the first stage, the third stage contains the
jobs that depend on jobs from stage 1 and 2, etc.
The problem with this simplistic approach is that each next stages only starts
when the previous stage is over which means means that some branches of the DAG
are unnecessarily blocked.
We would need to do our own DAG management to solve this issue which would be a
pretty significant undertaking. Something we should look at in the future.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.