[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125564#comment-15125564
 ] 

Marcelo Vanzin commented on SPARK-11157:
----------------------------------------

bq. Prior to removing the assemblies, it would be great if we could reconfigure 
our tests to not depend on the full assembly JAR

I'm pretty sure that's already the case for all Scala tests (see SPARK-9284). I 
think pyspark tests still need the streaming assemblies to work.

bq. Building up a -classpath argument that lists hundreds of JARs

You don't need to do that; you can use a wildcard ("-classpath 
/path/to/libs/*". The JVM interprets that as "all the jar files under the libs 
directory".

bq. This is going to require changes to Launcher, shell scripts, and a few 
other places

That's already scoped out in the linked document and in the bug summary; it's 
actually not a lot of work, especially if we don't keep the option to generate 
assemblies around.

> Allow Spark to be built without assemblies
> ------------------------------------------
>
>                 Key: SPARK-11157
>                 URL: https://issues.apache.org/jira/browse/SPARK-11157
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Build, Spark Core, YARN
>            Reporter: Marcelo Vanzin
>         Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to