[ 
https://issues.apache.org/jira/browse/HIVE-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829905#comment-15829905
 ] 

Xuefu Zhang commented on HIVE-15659:
------------------------------------

[~csun], do you know whether the SOFE exception happens at the driver or 
executor? Secondly, I'm not sure if Spark will load additional jars for each 
input file. To me, it seems to be "per task".

> StackOverflowError when ClassLoader.loadClass for Spark
> -------------------------------------------------------
>
>                 Key: HIVE-15659
>                 URL: https://issues.apache.org/jira/browse/HIVE-15659
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 2.2.0
>            Reporter: Chao Sun
>
> Sometimes a query needs to process a large number of input files, which could 
> cause the following error:
> {code}
> 17/01/15 09:31:52 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, hadoopworker1344-sjc1.prod.uber.internal): 
> java.lang.StackOverflowError
>         at 
> java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1535)
>         at java.lang.ClassLoader.getClassLoadingLock(ClassLoader.java:463)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:404)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
> {code}
> The cause, I think, is that for each input file we may need to load 
> additional jars to the class loader of the current thread. This accumulates 
> with the number of input files. When adding a new class loader, the old class 
> loader will be used as the parent of the new one. 
> See 
> [Utilities#getBaseWork|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L388]
>  for more details.
> One possible solution is to detect duplicated jar paths before creating the 
> new class loader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to