BELUGA BEHR created HIVE-16758:
----------------------------------

             Summary: Better Select Number of Replications
                 Key: HIVE-16758
                 URL: https://issues.apache.org/jira/browse/HIVE-16758
             Project: Hive
          Issue Type: Improvement
            Reporter: BELUGA BEHR
            Priority: Minor


{{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}

We should be smarter about how we pick a replication number.  We should add a 
new configuration equivalent to {{mapreduce.client.submit.file.replication}}.  
This value should be around the square root of the number of nodes and not 
hard-coded in the code.

{code}
public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
private int minReplication = 10;

  @Override
  protected void initializeOp(Configuration hconf) throws HiveException {
...
    int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
    // minReplication value should not cross the value of dfs.replication.max
    minReplication = Math.min(minReplication, dfsMaxReplication);
  }
{code}

https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to