[ 
https://issues.apache.org/jira/browse/OOZIE-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837284#comment-15837284
 ] 

Abhishek Bafna commented on OOZIE-2619:
---------------------------------------

Thanks [~venkatnrangan] for the patch. Committed to master.

> Make  Hive action defaults to match hive defaults when running from command 
> line
> --------------------------------------------------------------------------------
>
>                 Key: OOZIE-2619
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2619
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.3.0, 4.2.0
>            Reporter: Venkat Ranganathan
>            Assignee: Venkat Ranganathan
>             Fix For: 5.0.0
>
>         Attachments: OOZIE-2619.patch, OOZIE-2619.patch.2
>
>
> Over a few patches, we have done a few fixes to make Oozie Hive actions 
> easier for users.
> One of them was OOZIE-2051 which allows default hive and tez site xml configs 
> to be added to hive actions automatically by introducing action specific 
> configuration directory under oozie conf/action-conf directory and as a bonus 
> in an Ambari managed cluster the hive site changes done as part of the Hive 
> components are automatically reflected into the oozie hive action defaults.
> But there is one issue pending for Oozie hive actions.
> Oozie Hive jobs launched via hive action  are historically restricted to one 
> reducer by default (and also there are few other in terms of split sizes 
> etc).   Thisvis because of the way Oozie action config management is done and 
> how Hive was determining the reducers.   Hive uses mapreduce.job.reduces to 
> determine if the reducers have to be dynamically determined (when this 
> parameter is initialized to an invalid value -1) or explicitly determined by 
> the users.   In HiveConf, this is internally set to -1 if not in 
> hive-site.xml or in one of the set statements.
> Oozie, when it prepares the action configuration, has the 
> mapreduce.job.reduces set to 1 (from mapred-default).   As part of the hive 
> action, Oozie writes the action configuration prepared (the action.xml) also 
> as hive-site.xml with the value for mapreduce.job.reduces set to 1.
> There are a few ways to overcome this issue, true to Oozie being very
> flexible with lots of options :).  I may be missing a few other
> options here!
> # Explicitly set the mapreduce.job.reduces parameter in the configuration 
> element of the action
>     Every hive workflow configuration has be changed
> #  Add the parameter to a job-xml for the action
>     Once again affects all actions
> #  Set the parameter to disable loading of the default *-site.xml
> files as provided by OOZIE-2205
>    We need to make sure that the  *-site.xml are otherwise available to the 
> containers - either have hadoop conf directory (typically /etc/hadoop/conf) 
> in the mapred framework classpath or explicitly make the files using other 
> mechanisms available (as files, archives, in sharelib ec).   The big issue is 
> that this affects rolling upgrades once you add explicit config dependency
> Unfortunately we can't use the default action config addition introduced in 
> OOZIE-2051 for adding one more configuration file to the oozie hive action 
> conf directory with hive MR defaults.
> The config files under the action-conf/hive/*.xml or action-conf/hive.xml are 
> all merged using the method injectDefaults which only updates the target only 
> if it does not exist in the target configuration map.   In our case, 
> mapreduce.job.reduces already exsits in the action default configuration 
> (coming from mapred-default.xml) and hence does not get overwritten from the 
> action-conf/hive configuration files.
> The fix (essentially one line of code change) is to use the copy method of 
> XConfiguration  to copy the action-default config instead of using the 
> injectDefaults method and then provide the action-default/hive.xm with the 
> required mapred hive parameters with hive expected initial values.
> This patch introduces a change that has potential backward compatibility 
> issues.
> * If the action-conf/<action>.xml currently has entries that were no-ops so 
> far, they can be added to the action configuration.
> * Hive will work as expected when run as an Oozie action without users 
> needing to resort to changes!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to