Preferred Executor launch path?

Graham Dennis Wed, 27 Aug 2014 23:52:12 -0700

Hi all,

In the process of trying to resolve SPARK-3166 (inability to ship custom
serialisers in application jars)
https://issues.apache.org/jira/browse/SPARK-3166 I've discovered that
there's a bit of duplicated code for building the command for launching
Executors across SparkDeploySchedulerBackend.scala,
MesosSchedulerBackend.scala, and CoarseMesosSchedulerBackend.scala


Importantly, there is a slight difference in their behaviour where
SparkDeploySchedulerBackend doesn't launch the Executor with the
spark-class script, but instead tries to do something similar in
CommandUtils.scala.  MesosSchedulerBackend.scala and
CoarseMesosSchedulerBackend.scala both use the spark-class script.  Is the
latter the preferred approach?  So should I refactor all of these to use
spark-class, or is there a reason for the differing behaviour?

Secondly, the goal of SPARK-3166 is to have the user jar available to the
executor process at launch time (rather than when the first task is
received).  I'd like to get some feedback on what the preferred classpath
order should be.  The items to be ordered to determine the classpath are:

* The output of the compute-classpath script
* The config option spark.executor.extraClassPath
* The application jar (and anything added via SparkContext.addJar)

Complicating the matter is that the 'deploy' backend currently supports the
"spark.files.userClassPathFirst" option, but this is not supported by the
Mesos backends (and I don't think it's supported by the YARN backend).

Ignoring the "userClassPathFirst" option, the current behaviour for the
classpath is effectively:
1. The output of compute-classpath
2. The config option spark.executor.extraClassPath
3. The application jar (and anything added via SparkContext.addJar).

What should the preferred order be if userClassPathFirst is true?
 Currently the behaviour for the Deploy backend is effectively:
1. The application jar (and anything added via SparkContext.addJar)
2. The output of compute-classpath
3. The config option spark.executor.extraClassPath

To me it makes more sense for this to be in the order (application jar;
spark.executor.extraClassPath; compute-classpath).  Agree? Disagree?

Thanks,
Graham

Preferred Executor launch path?

Reply via email to