[
https://issues.apache.org/jira/browse/HIVE-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102625#comment-16102625
]
Rui Li commented on HIVE-17146:
-------------------------------
[~cabot], the code intends to distribute the hash table to more nodes so that
following tasks are more likely to get the data from local DN. In that sense,
it's intended to be bigger than {{dfs.replication}}. That's why we chose the
magic number 10 (not an ideal solution I agree).
However, since {{minReplication = Math.min(minReplication,
dfsMaxReplication)}}, I still don't understand how the replication factor
exceeds {{dfs.replication.max}} (by default 512)?
> Spark on Hive - Exception while joining tables - "Requested replication
> factor of 10 exceeds maximum of x"
> -----------------------------------------------------------------------------------------------------------
>
> Key: HIVE-17146
> URL: https://issues.apache.org/jira/browse/HIVE-17146
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 2.1.1, 3.0.0
> Reporter: George Smith
> Assignee: Ashutosh Chauhan
>
> We found a bug in the current implementation of
> [org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java]
> The *magic number 10* for minReplication factor can cause the exception when
> the configuration parameter _dfs.replication_ is lower than 10.
> Consider these [properties
> configuration|https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]
> on our cluster (with less than 10 nodes):
> {code}
> dfs.namenode.replication.min=1
> dfs.replication=2
> dfs.replication.max=512 (that's the default value)
> {code}
> The current implementation counts target file replication as follows
> (relevant snippets of the code):
> {code}
> private int minReplication = 10;
> ...
> int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
> // minReplication value should not cross the value of dfs.replication.max
> minReplication = Math.min(minReplication, dfsMaxReplication);
> ...
> FileSystem fs = path.getFileSystem(htsOperator.getConfiguration());
> short replication = fs.getDefaultReplication(path);
> ...
> int numOfPartitions = replication;
> replication = (short) Math.max(minReplication, numOfPartitions);
> //use replication value in fs.create(path, replication);
> {code}
> With a current code the used replication value is 10 and the config value
> _dfs.replication_ is not used at all.
> There are probably more (easy) ways to fix it:
> # Set field {code}private int minReplication = 1 ; {code} I don't see any
> obvious reason for the value 10. or
> # Init minReplication from config value _dfs.namenode.replication.min_ with a
> default value 1. or
> # Count replication this way: {code}replication = Math.min(numOfPartitions,
> dfsMaxReplication);{code} or
> # Use replication = numOfPartitions; directly
> Config value _dfs.replication_ has a default value 3 which is supposed to be
> always lower than "dfs.replication.max", no checking is probably needed.
> Any suggestions which option to choose?
> As a *workaround* for this issue we had to set dfs.replication.max=2, but
> obviously _dfs.replication_ value should NOT be ignored and the problem
> should be resolved.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)