[jira] [Commented] (SPARK-4048) Enhance and extend hadoop-provided profile

Kannan Rajah (JIRA) Mon, 01 Jun 2015 14:16:00 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568012#comment-14568012
 ]


Kannan Rajah commented on SPARK-4048:
-------------------------------------

Just for completeness of this discussion, let me point out how the 2 cases are 
different.

Case 1:
Spark uses code from Jar A.
This is same as curator case. Spark should bundle Jar A.

Case 2:
Spark uses code from Jar B that internally uses Jar A.
This is same as Hadoop API case. Spark can use a B-provided profile and not 
bundle Jar A and Jar B. Basically expect B and its dependencies to be provided.

Case3:
Spark uses code from Jar A and also code from Jar B that internally uses Jar A.
Spark should bundle Jar A. But this is a problematic case because we could end 
up with 2 different versions of jar A.

But overall I got what you are saying. Based on your comments, this profile has 
been added to satisfy some specific use case. So I won't file a JIRA.

> Enhance and extend hadoop-provided profile
> ------------------------------------------
>
>                 Key: SPARK-4048
>                 URL: https://issues.apache.org/jira/browse/SPARK-4048
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>             Fix For: 1.3.0
>
>
> The hadoop-provided profile is used to not package Hadoop dependencies inside 
> the Spark assembly. It works, sort of, but it could use some enhancements. A 
> quick list:
> - It doesn't include all things that could be removed from the assembly
> - It doesn't work well when you're publishing artifacts based on it 
> (SPARK-3812 fixes this)
> - There are other dependencies that could use similar treatment: Hive, HBase 
> (for the examples), Flume, Parquet, maybe others I'm missing at the moment.
> - Unit tests, more specifically, those that use local-cluster mode, do not 
> work when the assembly is built with this profile enabled.
> - The scripts to launch Spark jobs do not add needed "provided" jars to the 
> classpath when this profile is enabled, leaving it for people to figure that 
> out for themselves.
> - The examples assembly duplicates a lot of things in the main assembly.
> Part of this task is selfish since we build internally with this profile and 
> we'd like to make it easier for us to merge changes without having to keep 
> too many patches on top of upstream. But those feel like good improvements to 
> me, regardless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-4048) Enhance and extend hadoop-provided profile

Reply via email to