[jira] [Commented] (SPARK-4048) Enhance and extend hadoop-provided profile

Marcelo Vanzin (JIRA) Mon, 01 Jun 2015 12:07:04 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567807#comment-14567807
 ]


Marcelo Vanzin commented on SPARK-4048:
---------------------------------------

Perhaps. But hadoop-provided is really targeted at people who are packaging 
Spark - cross-version compatibility really is not the final goal. Those 
packaging Spark should make sure that they provide all the needed dependencies, 
and they kinda need to reassess those every release since things change.

Lots of things end up under the "hadoop-provided" umbrella because Hadoop 
distributions generally have all those things already available. That's 
certainly the case, for example, in CDH, where I based most of the changes (due 
to, well, me writing this mostly to make my life easier packaging Spark in CDH).

So I don't really see this as a regression because this is not something that 
is meant to be backwards compatible in the first place. If you feel strongly 
otherwise, you can always file a new bug and suggest a fix.

(BTW, not that it matters for you, but CDH 5.3, which is based on Hadoop 2.5, 
does contain the curator dependency.)

> Enhance and extend hadoop-provided profile
> ------------------------------------------
>
>                 Key: SPARK-4048
>                 URL: https://issues.apache.org/jira/browse/SPARK-4048
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>             Fix For: 1.3.0
>
>
> The hadoop-provided profile is used to not package Hadoop dependencies inside 
> the Spark assembly. It works, sort of, but it could use some enhancements. A 
> quick list:
> - It doesn't include all things that could be removed from the assembly
> - It doesn't work well when you're publishing artifacts based on it 
> (SPARK-3812 fixes this)
> - There are other dependencies that could use similar treatment: Hive, HBase 
> (for the examples), Flume, Parquet, maybe others I'm missing at the moment.
> - Unit tests, more specifically, those that use local-cluster mode, do not 
> work when the assembly is built with this profile enabled.
> - The scripts to launch Spark jobs do not add needed "provided" jars to the 
> classpath when this profile is enabled, leaving it for people to figure that 
> out for themselves.
> - The examples assembly duplicates a lot of things in the main assembly.
> Part of this task is selfish since we build internally with this profile and 
> we'd like to make it easier for us to merge changes without having to keep 
> too many patches on top of upstream. But those feel like good improvements to 
> me, regardless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-4048) Enhance and extend hadoop-provided profile

Reply via email to