[ 
https://issues.apache.org/jira/browse/SPARK-33864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesha Bhatta updated SPARK-33864:
-----------------------------------
    Summary: How can we submit or initiate multiple application with single or 
few JVM  (was: Avoid re-creating __spark_conf__5678XXXX.zip  in /tmp for each 
application submit and copy under application specific .staging directory)

> How can we submit or initiate multiple application with single or few JVM
> -------------------------------------------------------------------------
>
>                 Key: SPARK-33864
>                 URL: https://issues.apache.org/jira/browse/SPARK-33864
>             Project: Spark
>          Issue Type: Improvement
>          Components: Deploy
>    Affects Versions: 2.4.5
>            Reporter: Ramesha Bhatta
>            Priority: Major
>
> Avoid re-creating __spark_conf__5678XXXX.zip in /tmp for each application 
> submit and copy under application specific .staging directory
> In our use-case, literally millions of time creation of this zip file before 
> any actual change in configuration is not efficient and there should have 
> been an option to create this on need basis and option to re-use (cache).
> Direct impact is any submission with concurrency >40 (#of hyperthreaded 
> cores) leads to failure and CPU overload on GW. Tried Livy, however noticed, 
> in the background this solution also does a spark-submit and same problem 
> persists and getting "response code 404" and observe the same CPU overload on 
> server running livy. The concurrency is due to mini-batches over REST and 
> expecting and try to support 2000+ concurrent requests as long as we have the 
> resource to support in the cluster. For this spark-submit is the major 
> bottleneck because of the explained situation. For JARS submission, we have 
> more than one work-around (1.pre-distribute the jars to a specified folder 
> and refer local keyword or 2) stage the JARS in a HDFS location and specify 
> HDFS reference thus no file-copy per application).
> Looking at the code yarn/Client.scala, it appeared possible to make change in 
> the spark-submit and thus raising a enhancement request. 
> Please prioritize.
> I guess, the change needed is in 
> https://github.com/apache/spark/blob/48f93af9f3d40de5bf087eb1a06c1b9954b2ad76/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
>  line 745 ( "val confArchive = File.createTempFile(LOCALIZED_CONF_DIR, 
> ".zip", new File(Utils.getLocalDir(sparkConf))) )....
> Adding some logic like the last time the file created/file-existence etc. and 
> avoid re-creating again repetitively/excessively is right thing to do.
> Second change is avoid distributing this for every application and reuse from 
> shared HDFS location.
> ==
> // Upload the conf archive to HDFS manually, and record its location in the 
> configuration.
> // This will allow the AM to know where the conf archive is in HDFS, so that 
> it can be
> // distributed to the containers.
> //
> // This code forces the archive to be copied, so that unit tests pass (since 
> in that case both
> // file systems are the same and the archive wouldn't normally be copied). In 
> most (all?)
> // deployments, the archive would be copied anyway, since it's a temp file in 
> the local file
> // system.
> val remoteConfArchivePath = new Path(destDir, LOCALIZED_CONF_ARCHIVE)
> val remoteFs = FileSystem.get(remoteConfArchivePath.toUri(), hadoopConf)
> cachedResourcesConf.set(CACHED_CONF_ARCHIVE, remoteConfArchivePath.toString())
> val localConfArchive = new Path(createConfArchive().toURI())
> copyFileToRemote(destDir, localConfArchive, replication, symlinkCache, force 
> = true,
> destName = Some(LOCALIZED_CONF_ARCHIVE))
> ===
> Regards,
> -Ramesh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to