Repository: spark Updated Branches: refs/heads/master 8c70cb4c6 -> 553fd7b91
[SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail⦠â¦s on secure Hadoop https://issues.apache.org/jira/browse/SPARK-12654 So the bug here is that WholeTextFileRDD.getPartitions has: val conf = getConf in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that to create a new newJobContext. The newJobContext will copy credentials around, but credentials are only present in a JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration its changing it from a JobConf to Configuration and dropping the credentials that were there. NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why it works. Author: Thomas Graves <[email protected]> Closes #10651 from tgravescs/SPARK-12654. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/553fd7b9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/553fd7b9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/553fd7b9 Branch: refs/heads/master Commit: 553fd7b912a32476b481fd3f80c1d0664b6c6484 Parents: 8c70cb4 Author: Thomas Graves <[email protected]> Authored: Fri Jan 8 14:38:19 2016 -0600 Committer: Tom Graves <[email protected]> Committed: Fri Jan 8 14:38:19 2016 -0600 ---------------------------------------------------------------------- core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/553fd7b9/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala ---------------------------------------------------------------------- diff --git a/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala b/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala index 146609a..7a11978 100644 --- a/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala @@ -24,6 +24,7 @@ import scala.reflect.ClassTag import org.apache.hadoop.conf.{Configurable, Configuration} import org.apache.hadoop.io.Writable +import org.apache.hadoop.mapred.JobConf import org.apache.hadoop.mapreduce._ import org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit} import org.apache.hadoop.mapreduce.task.{JobContextImpl, TaskAttemptContextImpl} @@ -93,7 +94,13 @@ class NewHadoopRDD[K, V]( // issues, this cloning is disabled by default. NewHadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized { logDebug("Cloning Hadoop Configuration") - new Configuration(conf) + // The Configuration passed in is actually a JobConf and possibly contains credentials. + // To keep those credentials properly we have to create a new JobConf not a Configuration. + if (conf.isInstanceOf[JobConf]) { + new JobConf(conf) + } else { + new Configuration(conf) + } } } else { conf --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
