[ https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115898#comment-16115898 ]
BELUGA BEHR commented on HIVE-16758: ------------------------------------ [~csun] Thank you for the feedback. The reason I came across this issue in the first place was that I had to perform some tests using Hive-on-Spark, on a 3-node clusters. I was repeatedly having the inconvenience of my queries failing because the default value of 10 was larger than my 3-node cluster, thus causing my queries to immediately fail as my {{dfs.replication.max}} was set to 3. After failing, I would have to set {{dfs.replication.max}} to a value of 10 to continue my testing. We should be allowing users to use Hive-on-Spark without additional configuration on a 3 node cluster. Scaling Hive-on-Spark should require additional configuration, not the other way around. I can change the variable name. It's not my call regarding {{mapred.submit.replication}}. However, since in this context it was not already being used, I would not recommend moving forward with introducing a deprecated configuration into new code.. > Better Select Number of Replications > ------------------------------------ > > Key: HIVE-16758 > URL: https://issues.apache.org/jira/browse/HIVE-16758 > Project: Hive > Issue Type: Improvement > Reporter: BELUGA BEHR > Assignee: BELUGA BEHR > Priority: Minor > Attachments: HIVE-16758.1.patch > > > {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}} > We should be smarter about how we pick a replication number. We should add a > new configuration equivalent to {{mapreduce.client.submit.file.replication}}. > This value should be around the square root of the number of nodes and not > hard-coded in the code. > {code} > public static final String DFS_REPLICATION_MAX = "dfs.replication.max"; > private int minReplication = 10; > @Override > protected void initializeOp(Configuration hconf) throws HiveException { > ... > int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication); > // minReplication value should not cross the value of dfs.replication.max > minReplication = Math.min(minReplication, dfsMaxReplication); > } > {code} > https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml -- This message was sent by Atlassian JIRA (v6.4.14#64029)