[
https://issues.apache.org/jira/browse/OOZIE-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Kanter updated OOZIE-1954:
---------------------------------
Attachment: OOZIE-1954.patch
The patch adds an {{OozieActionConfigurator}} class with one method that
receives a {{JobConf}} object and can throw an
{{OozieActionConfiguratorException}} (slightly different from the original
proposal in the Description above). Implementations can update the {{JobConf}}
object as necessary and do whatever they want really; if they need to throw an
Exception, they can wrap it in an {{OozieActionConfiguratorException}}.
As I suggested in the Description above, I made this generic enough to work
with any action type, but only the MapReduce action is currently using it or
allowing it. I don't think the other actions types really need this feature
currently.
I've added unit tests and even a modified "map-reduce" example. The
documentation also explains how to use this feature.
I also tried it out in an actual cluster, including some error cases.
I'll try to get this up on ReviewBoard, but it's not liking the patch for some
reason.
> Add a way for the MapReduce action to be configured by Java code
> ----------------------------------------------------------------
>
> Key: OOZIE-1954
> URL: https://issues.apache.org/jira/browse/OOZIE-1954
> Project: Oozie
> Issue Type: New Feature
> Affects Versions: trunk
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Attachments: OOZIE-1954.patch
>
>
> With certain other components (e.g. Avro, HFileOutputFormat (HBase), etc), it
> becomes impractical to use the MapReduce action and users must instead use
> the Java action. The problem is that these components require a lot of extra
> configuration that is often hidden from the user in Java code (e.g.
> HFileOutputFormat.configureIncrementalLoad(job, table); which can also
> include decision logic, serialization, and other things that we can't do in
> an XML file directly.
> One way to solve this problem is to allow the user to give the MR action some
> Java code that would do this configuration, similar to how we allow the
> {{<job-xml>}} field to specify an external XML file of configuration
> properties.
> In more detail, we could have an interface; something like this:
> {code}
> public interface OozieActionConfigurator {
> public void updateOozieActionConfiguration(Configuration conf);
> }
> {code}
> that the user can implement, create a jar, and include with their MR action
> (i.e. add a "{{<config-class>}}" field that let's them specify the class
> name). To protect the Oozie server from running user code (which could do
> anything it wants really), it would have to be run in the Launcher Job. The
> Launcher Job could call this method after it loads the configuration prepared
> by the Oozie server.
> Another thing this will be helpful is with users who use the Java action to
> launch MR jobs and expect a bunch of things to be done for them that are not
> (e.g. delegation token propagation, config loading, returning the hadoop job
> to Oozie, etc). These are all done with the MR action, so the more users we
> can move to the MR action from the Java action, the less they'll run into
> these difficulties.
> Some of this may change slightly as I try to actually implement this (e.g.
> have to handle throwing exceptions etc). And one thing I may do is keep this
> general enough that it should be compatible with all action types in case we
> want to add this to any of them in the future; though for now, the schema
> would only accept it for the MapReduce action.
--
This message was sent by Atlassian JIRA
(v6.2#6252)