Venkat Ranganathan created OOZIE-2619:
-----------------------------------------

             Summary: Make  Hive action defaults to match hive defaults when 
running from command line
                 Key: OOZIE-2619
                 URL: https://issues.apache.org/jira/browse/OOZIE-2619
             Project: Oozie
          Issue Type: Bug
          Components: core
    Affects Versions: 4.2.0, 3.3.0
            Reporter: Venkat Ranganathan


Over a few patches, we have done a few fixes to make Oozie Hive actions easier 
for users.

One of them was OOZIE-2051 which allows default hive and tez site xml configs 
to be added to hive actions automatically by introducing action specific 
configuration directory under oozie conf/action-conf directory and as a bonus 
in an Ambari managed cluster the hive site changes done as part of the Hive 
components are automatically reflected into the oozie hive action defaults.

But there is one issue pending for Oozie hive actions.

Oozie Hive jobs launched via hive action  are historically restricted to one 
reducer by default (and also there are few other in terms of split sizes etc).  
 Thisvis because of the way Oozie action config management is done and how Hive 
was determining the reducers.   Hive uses mapreduce.job.reduces to determine if 
the reducers have to be dynamically determined (when this parameter is 
initialized to an invalid value -1) or explicitly determined by the users.   In 
HiveConf, this is internally set to -1 if not in hive-site.xml or in one of the 
set statements.

Oozie, when it prepares the action configuration, has the mapreduce.job.reduces 
set to 1 (from mapred-default).   As part of the hive action, Oozie writes the 
action configuration prepared (the action.xml) also as hive-site.xml with the 
value for mapreduce.job.reduces set to 1.

There are a few ways to overcome this issue, true to Oozie being very
flexible with lots of options :).  I may be missing a few other
options here!


# Explicitly set the mapreduce.job.reduces parameter in the configuration 
element of the action
    Every hive workflow configuration has be changed
#  Add the parameter to a job-xml for the action
    Once again affects all actions
#  Set the parameter to disable loading of the default *-site.xml
files as provided by OOZIE-2205
   We need to make sure that the  *-site.xml are otherwise available to the 
containers - either have hadoop conf directory (typically /etc/hadoop/conf) in 
the mapred framework classpath or explicitly make the files using other 
mechanisms available (as files, archives, in sharelib ec).   The big issue is 
that this affects rolling upgrades once you add explicit config dependency

Unfortunately we can't use the default action config addition introduced in 
OOZIE-2051 for adding one more configuration file to the oozie hive action conf 
directory with hive MR defaults.

The config files under the action-conf/hive/*.xml or action-conf/hive.xml are 
all merged using the method injectDefaults which only updates the target only 
if it does not exist in the target configuration map.   In our case, 
mapreduce.job.reduces already exsits in the action default configuration 
(coming from mapred-default.xml) and hence does not get overwritten from the 
action-conf/hive configuration files.

The fix (essentially one line of code change) is to use the copy method of 
XConfiguration  to copy the action-default config instead of using the 
injectDefaults method and then provide the action-default/hive.xm with the 
required mapred hive parameters with hive expected initial values.

This patch introduces a change that has potential backward compatibility issues.

* If the action-conf/<action>.xml currently has entries that were no-ops so 
far, they can be added to the action configuration.
* Hive will work as expected when run as an Oozie action without users needing 
to resort to changes!




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to