[ 
https://issues.apache.org/jira/browse/HIVE-18442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346916#comment-16346916
 ] 

Rui Li commented on HIVE-18442:
-------------------------------

In yarn-cluster mode, the user jar (i.e. hive-exec.jar) won't be put in the 
AM's system class path:
 
[https://github.com/apache/spark/blob/v2.2.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1229]
 Unless it's added to the driver's extra class path or we enable 
{{spark.yarn.user.classpath.first}}.

The AM in yarn-cluster mode is ApplicationMaster. It loads the hive-exec.jar 
and runs our RemoteDriver in dedicated thread. However, ApplicationMaster 
somehow triggers {{FileSystem::loadFileSystems}} before it launches this 
thread. So that we missed the chance to register the NullScanFileSystem.

 

Yarn-client mode also faces the same potential issue because hive-exec.jar is 
not in system class path when the JVM starts (in yarn-client, the RemoteDriver 
runs in SparkSubmit). But SparkSubmit doesn't trigger 
{{FileSystem::loadFileSystems}} before it runs RemoteDriver, which means we're 
just lucky in that case.

> HoS: No FileSystem for scheme: nullscan
> ---------------------------------------
>
>                 Key: HIVE-18442
>                 URL: https://issues.apache.org/jira/browse/HIVE-18442
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>            Priority: Major
>         Attachments: HIVE-18442.1.patch
>
>
> Hit the issue when I run following query in yarn-cluster mode:
> {code}
> select * from (select key from src where false) a left outer join (select key 
> from srcpart limit 0) b on a.key=b.key;
> {code}
> Stack trace:
> {noformat}
> Job failed with java.io.IOException: No FileSystem for scheme: nullscan
>       at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2799)
>       at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>       at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>       at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>       at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>       at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
>       at 
> org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2605)
>       at 
> org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2601)
>       at 
> org.apache.hadoop.hive.ql.exec.Utilities$GetInputPathsCallable.call(Utilities.java:3409)
>       at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:3347)
>       at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.cloneJobConf(SparkPlanGenerator.java:299)
>       at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:222)
>       at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:109)
>       at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:354)
>       at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:358)
>       at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to