[
https://issues.apache.org/jira/browse/SPARK-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Terence Yim updated SPARK-13441:
--------------------------------
Description:
NPE is throw from the yarn Client.scala because {{File.listFiles()}} can return
null on directory that it doesn't have permission to list. This is the code
fragment in question:
{noformat}
// In org/apache/spark/deploy/yarn/Client.scala
Seq("HADOOP_CONF_DIR", "YARN_CONF_DIR").foreach { envKey =>
sys.env.get(envKey).foreach { path =>
val dir = new File(path)
if (dir.isDirectory()) {
// dir.listFiles() can return null
dir.listFiles().foreach { file =>
if (file.isFile && !hadoopConfFiles.contains(file.getName())) {
hadoopConfFiles(file.getName()) = file
}
}
}
}
}
{noformat}
To reproduce, simply do:
{noformat}
sudo mkdir /tmp/conf
sudo chmod 700 /tmp/conf
export HADOOP_CONF_DIR=/etc/hadoop/conf
export YARN_CONF_DIR=/tmp/conf
spark-submit --master yarn-client SimpleApp.py
{noformat}
It fails on any Spark app. Though not important, the SimpleApp.py I used looks
like this:
{noformat}
from pyspark import SparkContext
sc = SparkContext(None, "Simple App")
data = [1, 2, 3, 4, 5]
distData = sc.parallelize(data)
total = distData.reduce(lambda a, b: a + b)
print("Total: %i" % total)
{noformat}
was:
NPE is throw from the yarn Client.scala because {{File.listFiles()}} can return
null on directory that it doesn't have permission to list. This is the code
fragment in question:
{noformat}
// In org/apache/spark/deploy/yarn/Client.scala
Seq("HADOOP_CONF_DIR", "YARN_CONF_DIR").foreach { envKey =>
sys.env.get(envKey).foreach { path =>
val dir = new File(path)
if (dir.isDirectory()) {
// dir.listFiles() can return null
dir.listFiles().foreach { file =>
if (file.isFile && !hadoopConfFiles.contains(file.getName())) {
hadoopConfFiles(file.getName()) = file
}
}
}
}
}
{noformat}
To reproduce, simply do:
{noformat}
sudo mkdir /tmp/conf
sudo chown 700 /tmp/conf
export HADOOP_CONF_DIR=/etc/hadoop/conf
export YARN_CONF_DIR=/tmp/conf
spark-submit --master yarn-client SimpleApp.py
{noformat}
It fails on any Spark app. Though not important, the SimpleApp.py I used looks
like this:
{noformat}
from pyspark import SparkContext
sc = SparkContext(None, "Simple App")
data = [1, 2, 3, 4, 5]
distData = sc.parallelize(data)
total = distData.reduce(lambda a, b: a + b)
print("Total: %i" % total)
{noformat}
> NullPointerException when either HADOOP_CONF_DIR or YARN_CONF_DIR is not
> readable
> ---------------------------------------------------------------------------------
>
> Key: SPARK-13441
> URL: https://issues.apache.org/jira/browse/SPARK-13441
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 1.4.1, 1.5.1, 1.6.0
> Reporter: Terence Yim
>
> NPE is throw from the yarn Client.scala because {{File.listFiles()}} can
> return null on directory that it doesn't have permission to list. This is the
> code fragment in question:
> {noformat}
> // In org/apache/spark/deploy/yarn/Client.scala
> Seq("HADOOP_CONF_DIR", "YARN_CONF_DIR").foreach { envKey =>
> sys.env.get(envKey).foreach { path =>
> val dir = new File(path)
> if (dir.isDirectory()) {
> // dir.listFiles() can return null
> dir.listFiles().foreach { file =>
> if (file.isFile && !hadoopConfFiles.contains(file.getName())) {
> hadoopConfFiles(file.getName()) = file
> }
> }
> }
> }
> }
> {noformat}
> To reproduce, simply do:
> {noformat}
> sudo mkdir /tmp/conf
> sudo chmod 700 /tmp/conf
> export HADOOP_CONF_DIR=/etc/hadoop/conf
> export YARN_CONF_DIR=/tmp/conf
> spark-submit --master yarn-client SimpleApp.py
> {noformat}
> It fails on any Spark app. Though not important, the SimpleApp.py I used
> looks like this:
> {noformat}
> from pyspark import SparkContext
> sc = SparkContext(None, "Simple App")
> data = [1, 2, 3, 4, 5]
> distData = sc.parallelize(data)
> total = distData.reduce(lambda a, b: a + b)
> print("Total: %i" % total)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]