Chester created SPARK-12800:
-------------------------------
Summary: Subtle bug on Spark Yarn Client under Kerberos Security
Mode
Key: SPARK-12800
URL: https://issues.apache.org/jira/browse/SPARK-12800
Project: Spark
Issue Type: Bug
Affects Versions: 1.5.2, 1.5.1
Reporter: Chester
Version used: Spark 1.5.1 (1.5.2-SNAPSHOT)
Deployment Mode: Yarn-Cluster
Problem observed:
When running spark job directly from YarnClient (without using spark-submit,
I did not verify the spark-submit has the same issue or not), when kerberos
security is enabled, the first time run spark job always fail. The failure is
due to that the hadoop consider the job is in SIMPLE model rather than Kerberos
mode. But without shutting down the JVM, run the same job again, the spark job
will pass. If one restart the JVM, then the spark job will fail again.
The cause:
Tracking down the source of the issue, I found that the problem seems lie at
the spark Yarn Client.scala. In the Client
def prepareLocalResources() method L 266 of Client.java, the following line
code is called.
YarnSparkHadoopUtil.get.obtainTokensForNamenodes(nns, hadoopConf, credentials)
The YarnSparkHadoopUtil.get is in turns get initialized via reflection
object SparkHadoopUtil {
private val hadoop = {
val yarnMode = java.lang.Boolean.valueOf(
System.getProperty("SPARK_YARN_MODE", System.getenv("SPARK_YARN_MODE")))
if (yarnMode) {
try {
Utils.classForName("org.apache.spark.deploy.yarn.YarnSparkHadoopUtil")
.newInstance()
.asInstanceOf[SparkHadoopUtil]
} catch {
case e: Exception => throw new SparkException("Unable to load YARN
support", e)
}
} else {
new SparkHadoopUtil
}
}
def get: SparkHadoopUtil = {
hadoop
}
}
class SparkHadoopUtil extends Logging {
private val sparkConf = new SparkConf()
val conf: Configuration = newConfiguration(sparkConf)
UserGroupInformation.setConfiguration(conf)
.... rest of line
}
Here SparkHadoopUtil creates a empty SparkConf and Hadoop Configuration from
that and set to UserGroupInformation
UserGroupInformation.setConfiguration(conf)
As the UserGroupInformation.authenticationMethod is static, above all wipe
out the security settings. UserGroupInformation.isSecurityEnabled() changed
from true to false. Thus the sequence call will fail.
Since the SparkHadoopUtil.hadoop is static/non-mutable variable, so
the next run it will be not create again, then
UserGroupInformation.setConfiguration(conf)
will not be called again, so the sequence spark job works.
The work around:
//first initialize the SparkHadoopUtil, which will create a static
instance
//which will set UserGroupInformation to a empty hadoop Configuration.
//we will need to reset the UserGroupInformation after that.
val util = SparkHadoopUtil.get
UserGroupInformation.setConfiguration(hadoopConf)
Then call
client.run()
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]