[jira] Commented: (HIVE-1107) Generic parallel execution framework for Hive (and Pig, and ...)

Jeff Hammerbacher (JIRA) Wed, 17 Nov 2010 14:15:40 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933194#action_12933194
 ]


Jeff Hammerbacher commented on HIVE-1107:
-----------------------------------------

Okay, thanks. Let me try to pull apart the issues so that I can understand them:

bq. Oozie is more complex than Pig and HIVE put together Compare their manuals, 
both in terms of length and readability.

bq. Oozie is (nearly?) turing complete XML, not easily human readable script, 
and scheduling one job takes far too much of it.

bq. Also, there is no need to force Oozie either, people can use Azkaban etc. 
for workflow.

Each of these objects seem moot, given that Oozie would be targeted by the Hive 
and Pig developers, not the Hive and Pig users. No Hive or Pig user would be 
required to write Oozie: the configuration files would be generated by the Hive 
and Pig query planners, from my understanding.

bq. I believe, mid-to-long term, that Pig/Hive will get significantly smarter 
about the way they construct MR jobs - they will want to run some of the nodes 
in the DAG, wait for their output (e.g. a sampler) and then make ever more 
complicated decisions to modify the DAG. I believe Oozie isn't the right tool 
to be using for this purpose.

Adaptive query optimization is indeed a noble goal. Oozie seems to think at the 
level of workflow rather than dataflow, so as you say, it may not be an 
appropriate layer for performing these optimizations. I'm not sure if it 
detracts from the ability of Hive or Pig to perform adaptive query optimization 
though, either.

Anyways, thanks for the discussion. We're certainly thinking through these 
issues as well.

> Generic parallel execution framework for Hive (and Pig, and ...)
> ----------------------------------------------------------------
>
>                 Key: HIVE-1107
>                 URL: https://issues.apache.org/jira/browse/HIVE-1107
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Carl Steinbach
>
> Pig and Hive each have their own libraries for handling plan execution. As we 
> prepare to invest more time improving Hive's plan execution mechanism we 
> should also start to consider ways of building a generic plan execution 
> mechanism that is capable of supporting the needs of Hive and Pig, as well as 
> other Hadoop data flow programming environments. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1107) Generic parallel execution framework for Hive (and Pig, and ...)

Reply via email to