[jira] Commented: (PIG-1734) Pig needs a more efficient DAG execution

Arun C Murthy (JIRA) Wed, 17 Nov 2010 13:55:37 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933181#action_12933181
 ]


Arun C Murthy commented on PIG-1734:
------------------------------------

+1 on a more efficient DAG execution engine, and for exploring common 
infrastructure between Pig and Hive.

It's hard to keep this in sync with HIVE-549, but I'll try.

Jeff and I came up with some requirements:

# A way to serialize and exchange this DAG (e.g. Avro, JSON, XML)
# A service to execute the DAG and ensure it runs to completion
# Ability to modify the DAG on the fly, potentially in reaction to execution of 
parents of the nodes.
# Maybe shared infrastructure for ability to restart the necessary components 
of the DAG etc.

Given the above, I do not believe Oozie is a right answer, I'd agree with Zheng 
(https://issues.apache.org/jira/browse/HIVE-1107?focusedCommentId=12805351&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12805351)
 that enhancing JobControl would probably be the sweet spot - this way Pig, 
Hive and even Oozie can use it.

Russel Jurney has similar views against using Oozie too: 
https://issues.apache.org/jira/browse/HIVE-1107?focusedCommentId=12888870&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12888870

> Pig needs a more efficient DAG execution
> ----------------------------------------
>
>                 Key: PIG-1734
>                 URL: https://issues.apache.org/jira/browse/PIG-1734
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>
> The current code uses Hadoop's Job control to execute one stage at a time. 
> The first stage includes all jobs with no dependencies, the second stage jobs 
> that depend only on jobs completed in the first stage, the third stage 
> contains the jobs that depend on jobs from stage 1 and 2, etc.
> The problem with this simplistic approach is that each next stages only 
> starts when the previous stage is over which means means that some branches 
> of the DAG are unnecessarily blocked.
> We would need to do our own DAG management to solve this issue which would be 
> a pretty significant undertaking. Something we should look at in the future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1734) Pig needs a more efficient DAG execution

Reply via email to