Chester created SPARK-12800:
-------------------------------

             Summary: Subtle bug on Spark Yarn Client under Kerberos Security 
Mode
                 Key: SPARK-12800
                 URL: https://issues.apache.org/jira/browse/SPARK-12800
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.5.2, 1.5.1
            Reporter: Chester


Version used: Spark 1.5.1 (1.5.2-SNAPSHOT) 
Deployment Mode: Yarn-Cluster
Problem observed: 
  When running spark job directly from YarnClient (without using spark-submit, 
I did not verify the spark-submit has the same issue or not), when kerberos 
security is enabled, the first time run spark job always fail. The failure is 
due to that the hadoop consider the job is in SIMPLE model rather than Kerberos 
mode.  But without shutting down the JVM, run the same job again, the spark job 
will pass. If one restart the JVM, then the spark job will fail again. 

The cause: 
  Tracking down the source of the issue, I found that the problem seems lie at 
the spark Yarn Client.scala. In the Client

def prepareLocalResources() method  L 266 of Client.java, the following line 
code is called. 

 YarnSparkHadoopUtil.get.obtainTokensForNamenodes(nns, hadoopConf, credentials)
    
The YarnSparkHadoopUtil.get is in turns get initialized via reflection


object SparkHadoopUtil {

  private val hadoop = {
    val yarnMode = java.lang.Boolean.valueOf(
        System.getProperty("SPARK_YARN_MODE", System.getenv("SPARK_YARN_MODE")))
    if (yarnMode) {
      try {
        Utils.classForName("org.apache.spark.deploy.yarn.YarnSparkHadoopUtil")
          .newInstance()
          .asInstanceOf[SparkHadoopUtil]
      } catch {
       case e: Exception => throw new SparkException("Unable to load YARN 
support", e)
      }
    } else {
      new SparkHadoopUtil
    }
  } 

  def get: SparkHadoopUtil = {
    hadoop
  }
}

 

class SparkHadoopUtil extends Logging {
  private val sparkConf = new SparkConf()
  val conf: Configuration = newConfiguration(sparkConf)
  UserGroupInformation.setConfiguration(conf)

   .... rest of line
}

Here SparkHadoopUtil creates a empty SparkConf and Hadoop Configuration from 
that and set to UserGroupInformation

  UserGroupInformation.setConfiguration(conf)


  As the UserGroupInformation.authenticationMethod is static, above all wipe 
out the security settings. UserGroupInformation.isSecurityEnabled() changed 
from true to false. Thus the sequence call will fail. 

 Since the SparkHadoopUtil.hadoop is static/non-mutable variable, so 
the next run it will be not create again, then 
UserGroupInformation.setConfiguration(conf) 
will not be called again, so the sequence spark job works. 

The work around: 
        //first initialize the SparkHadoopUtil, which will create a static 
instance
        //which will set UserGroupInformation to a empty hadoop Configuration.
        //we will need to reset the UserGroupInformation after that.
        val util = SparkHadoopUtil.get
        UserGroupInformation.setConfiguration(hadoopConf)

      Then call 
          client.run() 


   



 



     














    



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to