[
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239042#comment-15239042
]
Steve Loughran commented on SPARK-11157:
----------------------------------------
file a spark bug and if it needs escalation to YARN, one can be hooked in there.
What's clearly surfacing is the limit in the line length being exceeeded in the
{{SPARK_YARN_CACHE_FILES}} env var; this is set up at launch in
{{ClientDistributedCacheManager}}, picked up in {{ExecutorRunnable}}
I haven't seen this in YARN before, but hit the problem in other apps (hello,
ant {{<java>}} task). The general strategy is: save the data to a file, use the
environment variable to point to a file, rather than set it on the CLI
It'll be slightly complex with the YARN launch: any such file will have to be
localized.
If a simple java.io.Property file is used it's trivial to work with, and sets
the process up for all env vars related to the cache files and
{{SPARK_YARN_CACHE_ARCHIVES}}.
> Allow Spark to be built without assemblies
> ------------------------------------------
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
> Issue Type: Umbrella
> Components: Build, Spark Core, YARN
> Reporter: Marcelo Vanzin
> Assignee: Marcelo Vanzin
> Fix For: 2.0.0
>
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the
> ones I identified in the document, in the order that I think would be needed
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their
> deployments, we can't really remove them; but we can make them be dummy jars
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the
> examples jar would have to change (it won't be able to live in the same place
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user
> configuration); it needs to be modified so that it can localize all needed
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on
> a specific part of the issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]