[
https://issues.apache.org/jira/browse/OOZIE-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556996#comment-15556996
]
Venkat Ranganathan commented on OOZIE-2619:
-------------------------------------------
Test failures unrelated to patch
> Make Hive action defaults to match hive defaults when running from command
> line
> --------------------------------------------------------------------------------
>
> Key: OOZIE-2619
> URL: https://issues.apache.org/jira/browse/OOZIE-2619
> Project: Oozie
> Issue Type: Bug
> Components: core
> Affects Versions: 3.3.0, 4.2.0
> Reporter: Venkat Ranganathan
> Assignee: Venkat Ranganathan
> Attachments: OOZIE-2619.patch
>
>
> Over a few patches, we have done a few fixes to make Oozie Hive actions
> easier for users.
> One of them was OOZIE-2051 which allows default hive and tez site xml configs
> to be added to hive actions automatically by introducing action specific
> configuration directory under oozie conf/action-conf directory and as a bonus
> in an Ambari managed cluster the hive site changes done as part of the Hive
> components are automatically reflected into the oozie hive action defaults.
> But there is one issue pending for Oozie hive actions.
> Oozie Hive jobs launched via hive action are historically restricted to one
> reducer by default (and also there are few other in terms of split sizes
> etc). Thisvis because of the way Oozie action config management is done and
> how Hive was determining the reducers. Hive uses mapreduce.job.reduces to
> determine if the reducers have to be dynamically determined (when this
> parameter is initialized to an invalid value -1) or explicitly determined by
> the users. In HiveConf, this is internally set to -1 if not in
> hive-site.xml or in one of the set statements.
> Oozie, when it prepares the action configuration, has the
> mapreduce.job.reduces set to 1 (from mapred-default). As part of the hive
> action, Oozie writes the action configuration prepared (the action.xml) also
> as hive-site.xml with the value for mapreduce.job.reduces set to 1.
> There are a few ways to overcome this issue, true to Oozie being very
> flexible with lots of options :). I may be missing a few other
> options here!
> # Explicitly set the mapreduce.job.reduces parameter in the configuration
> element of the action
> Every hive workflow configuration has be changed
> # Add the parameter to a job-xml for the action
> Once again affects all actions
> # Set the parameter to disable loading of the default *-site.xml
> files as provided by OOZIE-2205
> We need to make sure that the *-site.xml are otherwise available to the
> containers - either have hadoop conf directory (typically /etc/hadoop/conf)
> in the mapred framework classpath or explicitly make the files using other
> mechanisms available (as files, archives, in sharelib ec). The big issue is
> that this affects rolling upgrades once you add explicit config dependency
> Unfortunately we can't use the default action config addition introduced in
> OOZIE-2051 for adding one more configuration file to the oozie hive action
> conf directory with hive MR defaults.
> The config files under the action-conf/hive/*.xml or action-conf/hive.xml are
> all merged using the method injectDefaults which only updates the target only
> if it does not exist in the target configuration map. In our case,
> mapreduce.job.reduces already exsits in the action default configuration
> (coming from mapred-default.xml) and hence does not get overwritten from the
> action-conf/hive configuration files.
> The fix (essentially one line of code change) is to use the copy method of
> XConfiguration to copy the action-default config instead of using the
> injectDefaults method and then provide the action-default/hive.xm with the
> required mapred hive parameters with hive expected initial values.
> This patch introduces a change that has potential backward compatibility
> issues.
> * If the action-conf/<action>.xml currently has entries that were no-ops so
> far, they can be added to the action configuration.
> * Hive will work as expected when run as an Oozie action without users
> needing to resort to changes!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)