spark git commit: [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail…

tgraves Fri, 08 Jan 2016 12:39:13 -0800

Repository: spark
Updated Branches:
  refs/heads/master 8c70cb4c6 -> 553fd7b91



[SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true failâ¦

â¦s on secure Hadoop

https://issues.apache.org/jira/browse/SPARK-12654

So the bug here is that WholeTextFileRDD.getPartitions has:
val conf = getConf
in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it 
uses that to create a new newJobContext.
The newJobContext will copy credentials around, but credentials are only 
present in a JobConf not in a Hadoop Configuration. So basically when it is 
cloning the hadoop configuration its changing it from a JobConf to 
Configuration and dropping the credentials that were there. NewHadoopRDD just 
uses the conf passed in for the getPartitions (not getConf) which is why it 
works.

Author: Thomas Graves <[email protected]>

Closes #10651 from tgravescs/SPARK-12654.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/553fd7b9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/553fd7b9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/553fd7b9

Branch: refs/heads/master
Commit: 553fd7b912a32476b481fd3f80c1d0664b6c6484
Parents: 8c70cb4
Author: Thomas Graves <[email protected]>
Authored: Fri Jan 8 14:38:19 2016 -0600
Committer: Tom Graves <[email protected]>
Committed: Fri Jan 8 14:38:19 2016 -0600

----------------------------------------------------------------------
 core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/553fd7b9/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
index 146609a..7a11978 100644
--- a/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
@@ -24,6 +24,7 @@ import scala.reflect.ClassTag
 
 import org.apache.hadoop.conf.{Configurable, Configuration}
 import org.apache.hadoop.io.Writable
+import org.apache.hadoop.mapred.JobConf
 import org.apache.hadoop.mapreduce._
 import org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit}
 import org.apache.hadoop.mapreduce.task.{JobContextImpl, 
TaskAttemptContextImpl}
@@ -93,7 +94,13 @@ class NewHadoopRDD[K, V](
       // issues, this cloning is disabled by default.
       NewHadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
         logDebug("Cloning Hadoop Configuration")
-        new Configuration(conf)
+        // The Configuration passed in is actually a JobConf and possibly 
contains credentials.
+        // To keep those credentials properly we have to create a new JobConf 
not a Configuration.
+        if (conf.isInstanceOf[JobConf]) {
+          new JobConf(conf)
+        } else {
+          new Configuration(conf)
+        }
       }
     } else {
       conf


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail…

Reply via email to