[ 
https://issues.apache.org/jira/browse/OOZIE-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated OOZIE-1954:
---------------------------------

    Attachment: OOZIE-1954.patch

The patch adds an {{OozieActionConfigurator}} class with one method that 
receives a {{JobConf}} object and can throw an 
{{OozieActionConfiguratorException}} (slightly different from the original 
proposal in the Description above).  Implementations can update the {{JobConf}} 
object as necessary and do whatever they want really; if they need to throw an 
Exception, they can wrap it in an {{OozieActionConfiguratorException}}.

As I suggested in the Description above, I made this generic enough to work 
with any action type, but only the MapReduce action is currently using it or 
allowing it.  I don't think the other actions types really need this feature 
currently.

I've added unit tests and even a modified "map-reduce" example.  The 
documentation also explains how to use this feature.

I also tried it out in an actual cluster, including some error cases.

I'll try to get this up on ReviewBoard, but it's not liking the patch for some 
reason.

> Add a way for the MapReduce action to be configured by Java code
> ----------------------------------------------------------------
>
>                 Key: OOZIE-1954
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1954
>             Project: Oozie
>          Issue Type: New Feature
>    Affects Versions: trunk
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: OOZIE-1954.patch
>
>
> With certain other components (e.g. Avro, HFileOutputFormat (HBase), etc), it 
> becomes impractical to use the MapReduce action and users must instead use 
> the Java action. The problem is that these components require a lot of extra 
> configuration that is often hidden from the user in Java code (e.g. 
> HFileOutputFormat.configureIncrementalLoad(job, table); which can also 
> include decision logic, serialization, and other things that we can't do in 
> an XML file directly.
> One way to solve this problem is to allow the user to give the MR action some 
> Java code that would do this configuration, similar to how we allow the 
> {{<job-xml>}} field to specify an external XML file of configuration 
> properties.
> In more detail, we could have an interface; something like this:
> {code}
> public interface OozieActionConfigurator {
>      public void updateOozieActionConfiguration(Configuration conf);
> }
> {code}
> that the user can implement, create a jar, and include with their MR action 
> (i.e. add a "{{<config-class>}}" field that let's them specify the class 
> name). To protect the Oozie server from running user code (which could do 
> anything it wants really), it would have to be run in the Launcher Job. The 
> Launcher Job could call this method after it loads the configuration prepared 
> by the Oozie server.
> Another thing this will be helpful is with users who use the Java action to 
> launch MR jobs and expect a bunch of things to be done for them that are not 
> (e.g. delegation token propagation, config loading, returning the hadoop job 
> to Oozie, etc). These are all done with the MR action, so the more users we 
> can move to the MR action from the Java action, the less they'll run into 
> these difficulties.
> Some of this may change slightly as I try to actually implement this (e.g. 
> have to handle throwing exceptions etc).  And one thing I may do is keep this 
> general enough that it should be compatible with all action types in case we 
> want to add this to any of them in the future; though for now, the schema 
> would only accept it for the MapReduce action.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to