[ https://issues.apache.org/jira/browse/HIVE-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933194#action_12933194 ]
Jeff Hammerbacher commented on HIVE-1107: ----------------------------------------- Okay, thanks. Let me try to pull apart the issues so that I can understand them: bq. Oozie is more complex than Pig and HIVE put together Compare their manuals, both in terms of length and readability. bq. Oozie is (nearly?) turing complete XML, not easily human readable script, and scheduling one job takes far too much of it. bq. Also, there is no need to force Oozie either, people can use Azkaban etc. for workflow. Each of these objects seem moot, given that Oozie would be targeted by the Hive and Pig developers, not the Hive and Pig users. No Hive or Pig user would be required to write Oozie: the configuration files would be generated by the Hive and Pig query planners, from my understanding. bq. I believe, mid-to-long term, that Pig/Hive will get significantly smarter about the way they construct MR jobs - they will want to run some of the nodes in the DAG, wait for their output (e.g. a sampler) and then make ever more complicated decisions to modify the DAG. I believe Oozie isn't the right tool to be using for this purpose. Adaptive query optimization is indeed a noble goal. Oozie seems to think at the level of workflow rather than dataflow, so as you say, it may not be an appropriate layer for performing these optimizations. I'm not sure if it detracts from the ability of Hive or Pig to perform adaptive query optimization though, either. Anyways, thanks for the discussion. We're certainly thinking through these issues as well. > Generic parallel execution framework for Hive (and Pig, and ...) > ---------------------------------------------------------------- > > Key: HIVE-1107 > URL: https://issues.apache.org/jira/browse/HIVE-1107 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Carl Steinbach > > Pig and Hive each have their own libraries for handling plan execution. As we > prepare to invest more time improving Hive's plan execution mechanism we > should also start to consider ways of building a generic plan execution > mechanism that is capable of supporting the needs of Hive and Pig, as well as > other Hadoop data flow programming environments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.