[jira] [Updated] (SPARK-33864) How can we submit or initiate multiple spark application with single or few JVM

Ramesha Bhatta (Jira) Wed, 03 Mar 2021 02:58:09 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-33864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ramesha Bhatta updated SPARK-33864:
-----------------------------------
    Summary: How can we submit or initiate multiple spark application with 
single or few JVM  (was: How can we submit or initiate multiple application 
with single or few JVM)

> How can we submit or initiate multiple spark application with single or few 
> JVM
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-33864
>                 URL: https://issues.apache.org/jira/browse/SPARK-33864
>             Project: Spark
>          Issue Type: Improvement
>          Components: Deploy
>    Affects Versions: 2.4.5
>            Reporter: Ramesha Bhatta
>            Priority: Major
>
> How can we have single JVM or few JVM process submit multiple application to 
> cluster.
> It is observed that each spark-submit opens upto 400 JARS of >1GB size and 
> creates  __spark_conf__XXXX.zip in /tmp  and copy under application specific 
> .staging directory.    When run concurrently for # of JVMs that can be 
> supported in a server is limited and submit alone takes 
> In our use-case, literally millions of time creation of this zip file before 
> any actual change in configuration is not efficient and there should have 
> been an option to create this on need basis and option to re-use (cache).
> Direct impact is any submission with concurrency >40 (#of hyperthreaded 
> cores) leads to failure and CPU overload on GW. Tried Livy, however noticed, 
> in the background this solution also does a spark-submit and same problem 
> persists and getting "response code 404" and observe the same CPU overload on 
> server running livy. The concurrency is due to mini-batches over REST and 
> expecting and try to support 2000+ concurrent requests as long as we have the 
> resource to support in the cluster. For this spark-submit is the major 
> bottleneck because of the explained situation. For JARS submission, we have 
> more than one work-around (1.pre-distribute the jars to a specified folder 
> and refer local keyword or 2) stage the JARS in a HDFS location and specify 
> HDFS reference thus no file-copy per application).
> Looking at the code yarn/Client.scala, it appeared possible to make change in 
> the spark-submit and thus raising a enhancement request. 
>  Please prioritize.
> I guess, the change needed is in 
> [https://github.com/apache/spark/blob/48f93af9f3d40de5bf087eb1a06c1b9954b2ad76/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala]
>  line 745 ( "val confArchive = File.createTempFile(LOCALIZED_CONF_DIR, 
> ".zip", new File(Utils.getLocalDir(sparkConf))) )....
> Adding some logic like the last time the file created/file-existence etc. and 
> avoid re-creating again repetitively/excessively is right thing to do.
> Second change is avoid distributing this for every application and reuse from 
> shared HDFS location.
>  ==
> // Upload the conf archive to HDFS manually, and record its location in the 
> configuration.
>  // This will allow the AM to know where the conf archive is in HDFS, so that 
> it can be
>  // distributed to the containers.
>  //
>  // This code forces the archive to be copied, so that unit tests pass (since 
> in that case both
>  // file systems are the same and the archive wouldn't normally be copied). 
> In most (all?)
>  // deployments, the archive would be copied anyway, since it's a temp file 
> in the local file
>  // system.
>  val remoteConfArchivePath = new Path(destDir, LOCALIZED_CONF_ARCHIVE)
>  val remoteFs = FileSystem.get(remoteConfArchivePath.toUri(), hadoopConf)
>  cachedResourcesConf.set(CACHED_CONF_ARCHIVE, 
> remoteConfArchivePath.toString())
> val localConfArchive = new Path(createConfArchive().toURI())
>  copyFileToRemote(destDir, localConfArchive, replication, symlinkCache, force 
> = true,
>  destName = Some(LOCALIZED_CONF_ARCHIVE))
>  ===
> Regards,
>  -Ramesh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-33864) How can we submit or initiate multiple spark application with single or few JVM

Reply via email to