[
https://issues.apache.org/jira/browse/HIVE-18442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346916#comment-16346916
]
Rui Li commented on HIVE-18442:
-------------------------------
In yarn-cluster mode, the user jar (i.e. hive-exec.jar) won't be put in the
AM's system class path:
[https://github.com/apache/spark/blob/v2.2.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1229]
Unless it's added to the driver's extra class path or we enable
{{spark.yarn.user.classpath.first}}.
The AM in yarn-cluster mode is ApplicationMaster. It loads the hive-exec.jar
and runs our RemoteDriver in dedicated thread. However, ApplicationMaster
somehow triggers {{FileSystem::loadFileSystems}} before it launches this
thread. So that we missed the chance to register the NullScanFileSystem.
Yarn-client mode also faces the same potential issue because hive-exec.jar is
not in system class path when the JVM starts (in yarn-client, the RemoteDriver
runs in SparkSubmit). But SparkSubmit doesn't trigger
{{FileSystem::loadFileSystems}} before it runs RemoteDriver, which means we're
just lucky in that case.
> HoS: No FileSystem for scheme: nullscan
> ---------------------------------------
>
> Key: HIVE-18442
> URL: https://issues.apache.org/jira/browse/HIVE-18442
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Rui Li
> Assignee: Rui Li
> Priority: Major
> Attachments: HIVE-18442.1.patch
>
>
> Hit the issue when I run following query in yarn-cluster mode:
> {code}
> select * from (select key from src where false) a left outer join (select key
> from srcpart limit 0) b on a.key=b.key;
> {code}
> Stack trace:
> {noformat}
> Job failed with java.io.IOException: No FileSystem for scheme: nullscan
> at
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2799)
> at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
> at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
> at
> org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2605)
> at
> org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2601)
> at
> org.apache.hadoop.hive.ql.exec.Utilities$GetInputPathsCallable.call(Utilities.java:3409)
> at
> org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:3347)
> at
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.cloneJobConf(SparkPlanGenerator.java:299)
> at
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:222)
> at
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:109)
> at
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:354)
> at
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:358)
> at
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)