[ 
https://issues.apache.org/jira/browse/SPARK-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174165#comment-14174165
 ] 

Marcelo Vanzin commented on SPARK-3979:
---------------------------------------

BTW, this would avoid issues like this:

{noformat}
Exception in thread "main" 
org.apache.hadoop.ipc.RemoteException(java.io.IOException): file 
/user/systest/.sparkStaging/application_1413485082283_0001/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0.jar.
Requested replication 3 exceeds maximum 1
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:943)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInt(FSNamesystem.java:2243)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2233)
        ...
        at 
org.apache.spark.deploy.yarn.ClientBase$class.copyFileToRemote(ClientBase.scala:101)
{noformat}

> Yarn backend's default file replication should match HDFS's default one
> -----------------------------------------------------------------------
>
>                 Key: SPARK-3979
>                 URL: https://issues.apache.org/jira/browse/SPARK-3979
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>            Reporter: Marcelo Vanzin
>            Priority: Minor
>
> This code in ClientBase.scala sets the replication used for files uploaded to 
> HDFS:
> {{noformat}}
>     val replication = sparkConf.getInt("spark.yarn.submit.file.replication", 
> 3).toShort
> {{noformat}}
> Instead of a hardcoded "3" (which is the default value for HDFS), it should 
> be using the default value from the HDFS conf ("dfs.replication").



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to