Robert Kanter created OOZIE-1954:
------------------------------------
Summary: Add a way for the MapReduce action to be configured by
Java code
Key: OOZIE-1954
URL: https://issues.apache.org/jira/browse/OOZIE-1954
Project: Oozie
Issue Type: New Feature
Affects Versions: trunk
Reporter: Robert Kanter
Assignee: Robert Kanter
With certain other components (e.g. Avro, HFileOutputFormat (HBase), etc), it
becomes impractical to use the MapReduce action and users must instead use the
Java action. The problem is that these components require a lot of extra
configuration that is often hidden from the user in Java code (e.g.
HFileOutputFormat.configureIncrementalLoad(job, table); which can also include
decision logic, serialization, and other things that we can't do in an XML file
directly.
One way to solve this problem is to allow the user to give the MR action some
Java code that would do this configuration, similar to how we allow the
{{<job-xml>}} field to specify an external XML file of configuration properties.
In more detail, we could have an interface; something like this:
{code}
public interface OozieActionConfigurator {
public void updateOozieActionConfiguration(Configuration conf);
}
{code}
that the user can implement, create a jar, and include with their MR action
(i.e. add a "{{<config-class>}}" field that let's them specify the class name).
To protect the Oozie server from running user code (which could do anything it
wants really), it would have to be run in the Launcher Job. The Launcher Job
could call this method after it loads the configuration prepared by the Oozie
server.
Another thing this will be helpful is with users who use the Java action to
launch MR jobs and expect a bunch of things to be done for them that are not
(e.g. delegation token propagation, config loading, returning the hadoop job to
Oozie, etc). These are all done with the MR action, so the more users we can
move to the MR action from the Java action, the less they'll run into these
difficulties.
Some of this may change slightly as I try to actually implement this (e.g. have
to handle throwing exceptions etc). And one thing I may do is keep this
general enough that it should be compatible with all action types in case we
want to add this to any of them in the future; though for now, the schema would
only accept it for the MapReduce action.
--
This message was sent by Atlassian JIRA
(v6.2#6252)