[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

Marcelo Vanzin (JIRA) Thu, 13 Apr 2017 15:44:03 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968343#comment-15968343
 ]


Marcelo Vanzin commented on SPARK-20328:
----------------------------------------

Hmm... that seems related to delegation token support. Delegation tokens need a 
"renewer", and generally in YARN applications the renewer is the YARN service 
(IIRC), since it will take care of renewing delegation tokens submitted with 
your application (and cancel them after the application is done).

In your case Mesos doesn't know about kerberos, so the user submitting the app 
needs to be the renewer; and, aside from this particular issue, you may need to 
add code to actually renew those tokens periodically (this is different from 
creating new tokens after their max life time).

I don't think you'll find a different way around this from the one you have 
(setting the YARN configs). You just need to make the Mesos backend in Spark do 
that automatically for the submitting user. As far as the Hadoop library, you 
could try to open a bug so they can add an explicit option so that non-MR, 
non-YARN applications can set the renewer more easily.

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -----------------------------------------------------------------
>
>                 Key: SPARK-20328
>                 URL: https://issues.apache.org/jira/browse/SPARK-20328
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0, 2.1.1, 2.1.2
>            Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>       at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>       at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>       at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>       at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
>       at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>       at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
> {code}
> I have a workaround where I set a YARN-specific configuration variable to 
> trick {{TokenCache}} into thinking YARN is configured, but this is obviously 
> suboptimal.
> The proper fix to this would likely require significant {{hadoop}} 
> refactoring to make split information available without going through 
> {{JobConf}}, so I'm not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

Reply via email to