[ https://issues.apache.org/jira/browse/SPARK-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174165#comment-14174165 ]
Marcelo Vanzin commented on SPARK-3979: --------------------------------------- BTW, this would avoid issues like this: {noformat} Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): file /user/systest/.sparkStaging/application_1413485082283_0001/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0.jar. Requested replication 3 exceeds maximum 1 at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:943) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInt(FSNamesystem.java:2243) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2233) ... at org.apache.spark.deploy.yarn.ClientBase$class.copyFileToRemote(ClientBase.scala:101) {noformat} > Yarn backend's default file replication should match HDFS's default one > ----------------------------------------------------------------------- > > Key: SPARK-3979 > URL: https://issues.apache.org/jira/browse/SPARK-3979 > Project: Spark > Issue Type: Bug > Components: YARN > Reporter: Marcelo Vanzin > Priority: Minor > > This code in ClientBase.scala sets the replication used for files uploaded to > HDFS: > {{noformat}} > val replication = sparkConf.getInt("spark.yarn.submit.file.replication", > 3).toShort > {{noformat}} > Instead of a hardcoded "3" (which is the default value for HDFS), it should > be using the default value from the HDFS conf ("dfs.replication"). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org