[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239042#comment-15239042
 ] 

Steve Loughran commented on SPARK-11157:
----------------------------------------

file a spark bug and if it needs escalation to YARN, one can be hooked in there.

What's clearly surfacing is the limit in the line length being exceeeded in the 
{{SPARK_YARN_CACHE_FILES}} env var; this is set up at launch in 
{{ClientDistributedCacheManager}}, picked up in {{ExecutorRunnable}}

I haven't seen this in YARN before, but hit the problem in other apps (hello, 
ant {{<java>}} task). The general strategy is: save the data to a file, use the 
environment variable to point to a file, rather than set it on the CLI

It'll be slightly complex with the YARN launch: any such file will have to be 
localized.

If a simple java.io.Property file is used it's trivial to work with, and sets 
the process up for all env vars related to the cache files and 
{{SPARK_YARN_CACHE_ARCHIVES}}. 



> Allow Spark to be built without assemblies
> ------------------------------------------
>
>                 Key: SPARK-11157
>                 URL: https://issues.apache.org/jira/browse/SPARK-11157
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Build, Spark Core, YARN
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>             Fix For: 2.0.0
>
>         Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to