Chao Sun commented on HIVE-16758:

Thanks [~belugabehr] for the patch and my apology for replying so late! The 
patch looks good: I agree that {{dfs.replication.max}} should not be used here. 

A few minor comments:
1. When we set the number of replication, should we also consider 
{{mapred.submit.replication}} too? (even though it's deprecated people may 
still use it)
2. Can we rename {{MIN_REPLICATION}} to something like {{DEFAULT_REPLICATION}}? 
Also, can we leave it as 10? I assume this will benefit HoS users on large 
clusters. For smaller cluster people can set the config to be smaller value.

> Better Select Number of Replications
> ------------------------------------
>                 Key: HIVE-16758
>                 URL: https://issues.apache.org/jira/browse/HIVE-16758
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Minor
>         Attachments: HIVE-16758.1.patch
> {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}
> We should be smarter about how we pick a replication number.  We should add a 
> new configuration equivalent to {{mapreduce.client.submit.file.replication}}. 
>  This value should be around the square root of the number of nodes and not 
> hard-coded in the code.
> {code}
> public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
> private int minReplication = 10;
>   @Override
>   protected void initializeOp(Configuration hconf) throws HiveException {
> ...
>     int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
>     // minReplication value should not cross the value of dfs.replication.max
>     minReplication = Math.min(minReplication, dfsMaxReplication);
>   }
> {code}
> https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

This message was sent by Atlassian JIRA

Reply via email to