[jira] [Commented] (SPARK-4048) Enhance and extend hadoop-provided profile

Kannan Rajah (JIRA) Mon, 01 Jun 2015 11:51:00 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567783#comment-14567783
 ]


Kannan Rajah commented on SPARK-4048:
-------------------------------------

Some clarification ...

- My understanding of hadoop-provided is that the hadoop and its dependent jars 
are available as part of the hadoop installation and we can link them to Spark 
classpath. They need not be bundled inside Spark assembly itself.
- In a Hadoop 2.5 installation, there are no curator jars. So even after you 
add all the jars inside Hadoop installation dir, you will have this problem. 
This does not happen with Hadoop 2.7 because curator jars are present under its 
share/hadoop/common/lib dir.
- Since curator jar is explicitly being used by Spark class 
ZooKeeperLeaderElectionAgent (outside the context of Hadoop itself), this jar 
should not be subject to any profile. Maybe this was done to avoid curator 
version conflict between hadoop and Spark, but I think this is still a 
regression when a customer using Hadoop 2.5 upgrades from Spark 1.2 to Spark 
1.3 that was built using hadoop-provided profile.

> Enhance and extend hadoop-provided profile
> ------------------------------------------
>
>                 Key: SPARK-4048
>                 URL: https://issues.apache.org/jira/browse/SPARK-4048
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>             Fix For: 1.3.0
>
>
> The hadoop-provided profile is used to not package Hadoop dependencies inside 
> the Spark assembly. It works, sort of, but it could use some enhancements. A 
> quick list:
> - It doesn't include all things that could be removed from the assembly
> - It doesn't work well when you're publishing artifacts based on it 
> (SPARK-3812 fixes this)
> - There are other dependencies that could use similar treatment: Hive, HBase 
> (for the examples), Flume, Parquet, maybe others I'm missing at the moment.
> - Unit tests, more specifically, those that use local-cluster mode, do not 
> work when the assembly is built with this profile enabled.
> - The scripts to launch Spark jobs do not add needed "provided" jars to the 
> classpath when this profile is enabled, leaving it for people to figure that 
> out for themselves.
> - The examples assembly duplicates a lot of things in the main assembly.
> Part of this task is selfish since we build internally with this profile and 
> we'd like to make it easier for us to merge changes without having to keep 
> too many patches on top of upstream. But those feel like good improvements to 
> me, regardless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-4048) Enhance and extend hadoop-provided profile

Reply via email to