[ https://issues.apache.org/jira/browse/OOZIE-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837284#comment-15837284 ]
Abhishek Bafna commented on OOZIE-2619: --------------------------------------- Thanks [~venkatnrangan] for the patch. Committed to master. > Make Hive action defaults to match hive defaults when running from command > line > -------------------------------------------------------------------------------- > > Key: OOZIE-2619 > URL: https://issues.apache.org/jira/browse/OOZIE-2619 > Project: Oozie > Issue Type: Bug > Components: core > Affects Versions: 3.3.0, 4.2.0 > Reporter: Venkat Ranganathan > Assignee: Venkat Ranganathan > Fix For: 5.0.0 > > Attachments: OOZIE-2619.patch, OOZIE-2619.patch.2 > > > Over a few patches, we have done a few fixes to make Oozie Hive actions > easier for users. > One of them was OOZIE-2051 which allows default hive and tez site xml configs > to be added to hive actions automatically by introducing action specific > configuration directory under oozie conf/action-conf directory and as a bonus > in an Ambari managed cluster the hive site changes done as part of the Hive > components are automatically reflected into the oozie hive action defaults. > But there is one issue pending for Oozie hive actions. > Oozie Hive jobs launched via hive action are historically restricted to one > reducer by default (and also there are few other in terms of split sizes > etc). Thisvis because of the way Oozie action config management is done and > how Hive was determining the reducers. Hive uses mapreduce.job.reduces to > determine if the reducers have to be dynamically determined (when this > parameter is initialized to an invalid value -1) or explicitly determined by > the users. In HiveConf, this is internally set to -1 if not in > hive-site.xml or in one of the set statements. > Oozie, when it prepares the action configuration, has the > mapreduce.job.reduces set to 1 (from mapred-default). As part of the hive > action, Oozie writes the action configuration prepared (the action.xml) also > as hive-site.xml with the value for mapreduce.job.reduces set to 1. > There are a few ways to overcome this issue, true to Oozie being very > flexible with lots of options :). I may be missing a few other > options here! > # Explicitly set the mapreduce.job.reduces parameter in the configuration > element of the action > Every hive workflow configuration has be changed > # Add the parameter to a job-xml for the action > Once again affects all actions > # Set the parameter to disable loading of the default *-site.xml > files as provided by OOZIE-2205 > We need to make sure that the *-site.xml are otherwise available to the > containers - either have hadoop conf directory (typically /etc/hadoop/conf) > in the mapred framework classpath or explicitly make the files using other > mechanisms available (as files, archives, in sharelib ec). The big issue is > that this affects rolling upgrades once you add explicit config dependency > Unfortunately we can't use the default action config addition introduced in > OOZIE-2051 for adding one more configuration file to the oozie hive action > conf directory with hive MR defaults. > The config files under the action-conf/hive/*.xml or action-conf/hive.xml are > all merged using the method injectDefaults which only updates the target only > if it does not exist in the target configuration map. In our case, > mapreduce.job.reduces already exsits in the action default configuration > (coming from mapred-default.xml) and hence does not get overwritten from the > action-conf/hive configuration files. > The fix (essentially one line of code change) is to use the copy method of > XConfiguration to copy the action-default config instead of using the > injectDefaults method and then provide the action-default/hive.xm with the > required mapred hive parameters with hive expected initial values. > This patch introduces a change that has potential backward compatibility > issues. > * If the action-conf/<action>.xml currently has entries that were no-ops so > far, they can be added to the action configuration. > * Hive will work as expected when run as an Oozie action without users > needing to resort to changes! -- This message was sent by Atlassian JIRA (v6.3.4#6332)