shanyu zhao created SPARK-31029:
-----------------------------------

             Summary: Occasional class not found error in user's Future code 
using global ExecutionContext
                 Key: SPARK-31029
                 URL: https://issues.apache.org/jira/browse/SPARK-31029
             Project: Spark
          Issue Type: Bug
          Components: YARN
    Affects Versions: 2.4.5
            Reporter: shanyu zhao


*Problem:*
When running tpc-ds test (https://github.com/databricks/spark-sql-perf), 
occasionally we see error related to class not found:

2020-02-04 20:00:26,673 ERROR yarn.ApplicationMaster: User class threw 
exception: scala.ScalaReflectionException: class 
com.databricks.spark.sql.perf.ExperimentRun in JavaMirror with 
sun.misc.Launcher$AppClassLoader@28ba21f3 of type class 
sun.misc.Launcher$AppClassLoader with classpath [...] 
and parent being sun.misc.Launcher$ExtClassLoader@3ff5d147 of type class 
sun.misc.Launcher$ExtClassLoader with classpath [...] 
and parent being primordial classloader with boot classpath [...] not found.

*Root cause:*
Spark driver starts ApplicationMaster in the main thread, which starts a user 
thread and set MutableURLClassLoader to that thread's ContextClassLoader.
        userClassThread = startUserApplication()

The main thread then setup YarnSchedulerBackend RPC endpoints, which handles 
these calls using scala Future with the default global ExecutionContext:
    - doRequestTotalExecutors
    - doKillExecutors

If main thread starts a future to handle doKillExecutors() before user thread 
does then the default thread pool thread's ContextClassLoader would be the 
default (AppClassLoader). 
If user thread starts a future first then the thread pool thread will have 
MutableURLClassLoader.

So if user's code uses a future which references a user provided class (only 
MutableURLClassLoader can load), and before the future if there are executor 
lost, you will see errors related to class not found.

*Proposed Solution:*
Set the same class loader (userClassLoader) to both the main thread and user 
thread in ApplicationMaster.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to