[ 
https://issues.apache.org/jira/browse/HIVE-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888870#action_12888870
 ] 

Russell Jurney commented on HIVE-1107:
--------------------------------------

At Jeff's suggestion, my comments on this ticket for Hive and Pig follow.

Oozie has been suggested as a solution to this ticket, but it is in my opinion 
far too complex to be appropriate for Pig or HIVE.  A scheduler should not be 
more complex than the language it schedules, and Oozie is more complex than Pig 
and HIVE put together.  Compare their manuals, both in terms of length and 
readability.  Furthermore, Oozie is (nearly?) turing complete XML, not easily 
human readable script, and scheduling one job takes far too much of it.

Pig and HIVE aim to deliver simplicity and accessibility.  In time Oozie may 
mature, but it is not there yet.  The features are present, but the open source 
interface is extremely raw.  The only simple interface to Oozie is a 
proprietary GUI.  Perhaps the next major release will be an improvement.

A tight binding between these projects would cause LinkedIn problems, as we use 
Azkaban to schedule pig jobs.  Scheduling a job in Azkaban consists of creating 
a zip file of your job's content, inserting a very brief config (typically 3-6 
lines), and issuing a one-line command.  The web interface to Azkaban is free.  
This makes it a more appropriate choice for this ticket than Oozie, but making 
Azkaban tightly bound to Pig would be a terrible idea too.

We should be very careful about adding enterprise baggage to these tools that 
is simply not needed for the vast majority of users.  Convention over 
configuration is at the core of Pig and HIVE.  Lets not spoil that.

> Generic parallel execution framework for Hive (and Pig, and ...)
> ----------------------------------------------------------------
>
>                 Key: HIVE-1107
>                 URL: https://issues.apache.org/jira/browse/HIVE-1107
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Carl Steinbach
>
> Pig and Hive each have their own libraries for handling plan execution. As we 
> prepare to invest more time improving Hive's plan execution mechanism we 
> should also start to consider ways of building a generic plan execution 
> mechanism that is capable of supporting the needs of Hive and Pig, as well as 
> other Hadoop data flow programming environments. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to