Venkat Ranganathan created OOZIE-2619:
-----------------------------------------
Summary: Make Hive action defaults to match hive defaults when
running from command line
Key: OOZIE-2619
URL: https://issues.apache.org/jira/browse/OOZIE-2619
Project: Oozie
Issue Type: Bug
Components: core
Affects Versions: 4.2.0, 3.3.0
Reporter: Venkat Ranganathan
Over a few patches, we have done a few fixes to make Oozie Hive actions easier
for users.
One of them was OOZIE-2051 which allows default hive and tez site xml configs
to be added to hive actions automatically by introducing action specific
configuration directory under oozie conf/action-conf directory and as a bonus
in an Ambari managed cluster the hive site changes done as part of the Hive
components are automatically reflected into the oozie hive action defaults.
But there is one issue pending for Oozie hive actions.
Oozie Hive jobs launched via hive action are historically restricted to one
reducer by default (and also there are few other in terms of split sizes etc).
Thisvis because of the way Oozie action config management is done and how Hive
was determining the reducers. Hive uses mapreduce.job.reduces to determine if
the reducers have to be dynamically determined (when this parameter is
initialized to an invalid value -1) or explicitly determined by the users. In
HiveConf, this is internally set to -1 if not in hive-site.xml or in one of the
set statements.
Oozie, when it prepares the action configuration, has the mapreduce.job.reduces
set to 1 (from mapred-default). As part of the hive action, Oozie writes the
action configuration prepared (the action.xml) also as hive-site.xml with the
value for mapreduce.job.reduces set to 1.
There are a few ways to overcome this issue, true to Oozie being very
flexible with lots of options :). I may be missing a few other
options here!
# Explicitly set the mapreduce.job.reduces parameter in the configuration
element of the action
Every hive workflow configuration has be changed
# Add the parameter to a job-xml for the action
Once again affects all actions
# Set the parameter to disable loading of the default *-site.xml
files as provided by OOZIE-2205
We need to make sure that the *-site.xml are otherwise available to the
containers - either have hadoop conf directory (typically /etc/hadoop/conf) in
the mapred framework classpath or explicitly make the files using other
mechanisms available (as files, archives, in sharelib ec). The big issue is
that this affects rolling upgrades once you add explicit config dependency
Unfortunately we can't use the default action config addition introduced in
OOZIE-2051 for adding one more configuration file to the oozie hive action conf
directory with hive MR defaults.
The config files under the action-conf/hive/*.xml or action-conf/hive.xml are
all merged using the method injectDefaults which only updates the target only
if it does not exist in the target configuration map. In our case,
mapreduce.job.reduces already exsits in the action default configuration
(coming from mapred-default.xml) and hence does not get overwritten from the
action-conf/hive configuration files.
The fix (essentially one line of code change) is to use the copy method of
XConfiguration to copy the action-default config instead of using the
injectDefaults method and then provide the action-default/hive.xm with the
required mapred hive parameters with hive expected initial values.
This patch introduces a change that has potential backward compatibility issues.
* If the action-conf/<action>.xml currently has entries that were no-ops so
far, they can be added to the action configuration.
* Hive will work as expected when run as an Oozie action without users needing
to resort to changes!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)