[ 
https://issues.apache.org/jira/browse/SPARK-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900821#comment-15900821
 ] 

Saisai Shao commented on SPARK-19812:
-------------------------------------

[~tgraves], I'm not quite sure what you mean here?

bq. The tests are using files rather then directories so it didn't catch. We 
need to fix the test also.

>From my understanding this issues happens when dest dir is not empty and try 
>to move with REPLACE_EXISTING, but I cannot imagine how this happened, because 
>if dest dir is not empty, then it should be returned before, will not go to 
>check old NM local dirs.

> YARN shuffle service fails to relocate recovery DB directories
> --------------------------------------------------------------
>
>                 Key: SPARK-19812
>                 URL: https://issues.apache.org/jira/browse/SPARK-19812
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.0.1
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>
> The yarn shuffle service tries to switch from the yarn local directories to 
> the real recovery directory but can fail to move the existing recovery db's.  
> It fails due to Files.move not doing directories that have contents.
> 2017-03-03 14:57:19,558 [main] ERROR yarn.YarnShuffleService: Failed to move 
> recovery file sparkShuffleRecovery.ldb to the path 
> /mapred/yarn-nodemanager/nm-aux-services/spark_shuffle
> java.nio.file.DirectoryNotEmptyException:/yarn-local/sparkShuffleRecovery.ldb
>         at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:498)
>         at 
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
>         at java.nio.file.Files.move(Files.java:1395)
>         at 
> org.apache.spark.network.yarn.YarnShuffleService.initRecoveryDb(YarnShuffleService.java:369)
>         at 
> org.apache.spark.network.yarn.YarnShuffleService.createSecretManager(YarnShuffleService.java:200)
>         at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:174)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:262)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:357)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:636)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:684)
> This used to use f.renameTo and we switched it in the pr due to review 
> comments and it looks like didn't do a final real test. The tests are using 
> files rather then directories so it didn't catch. We need to fix the test 
> also.
> history: 
> https://github.com/apache/spark/pull/14999/commits/65de8531ccb91287f5a8a749c7819e99533b9440



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to