Jeff Field created SPARK-15371:
----------------------------------
Summary: YARNShuffleService doesn't get current local-dirs from
NodeManager
Key: SPARK-15371
URL: https://issues.apache.org/jira/browse/SPARK-15371
Project: Spark
Issue Type: Bug
Components: Shuffle, YARN
Affects Versions: 1.6.1, 1.6.0, 1.5.2, 1.5.1, 1.5.0, 1.6.2, 2.0.0
Reporter: Jeff Field
Priority: Minor
In YarnShuffleService.java, the YarnShuffleService loads in the conf settings
from YARN to get a list of local directories, and then if it doesn't find an
existing levelDB file on any of them (for recovery), it will create one in the
directory that is the first element of the list. Since it isn't asking YARN for
the current list of healthy local-dirs (rather just the ones in the config), if
the first directory is a known-bad location to the NodeManager,
YarnShuffleService will continue to try to use it.
Removing the bad directory from the config fixes this, but Spark should get a
current list from YARN instead of using the list from the config. There are
examples of this in
https://github.com/apache/hadoop/blob/branch-2.7.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java
but I'm not sure the right way for Spark to implement that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]