[jira] [Commented] (SPARK-1518) Spark master doesn't compile against hadoop-common trunk

Patrick Wendell (JIRA) Wed, 28 May 2014 17:00:20 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011851#comment-14011851
 ]


Patrick Wendell commented on SPARK-1518:
----------------------------------------

bq. In practice it look like one generic Hadoop 1, Hadoop 2, and CDH 4 release 
is produced, and 1 set of Maven artifact. (PS again I am not sure Spark should 
contain a CDH-specific distribution? realizing it's really a proxy for a 
particular Hadoop combo. Same goes for a MapR profile, which is really for 
vendors to maintain) That means right now you can't build a Spark app for 
anything but Hadoop 1.x with Maven, without installing it yourself, and there's 
not an official distro for anything but two major Hadoop versions. Support for 
niche versions isn't really there or promised anyway, and fleshing out 
"support" may make doing so pretty burdensome.

We need to update the list of binary builds for Spark... some are getting 
outdated. The workflow for people building Spark apps is that they write their 
app against the Spark API's in Maven central (they can do this no matter which 
cluster they want to run on). To run the app, If they just want to run it 
locally they can spark-submit from any compiled package of Spark, or they can 
use their build tool to just run it. If they want to submit it to a cluster, 
users need to have a Spark package compiled for the Hadoop version on the 
cluster. Because of this we distribute pre-compiled builds to allow people to 
avoid ever having to compile Spark.

In terms of vendor-specific builds, we've done this because users asked for it. 
It's useful if, e.g. a user wants to submit a Spark job to a CDH or MapR 
cluster. Or run spark-shell locally and read data from a CDH HDFS cluster. 
That's the main use case we want to support.

I don't know what it means that you "can't build a Spark app" for Hadoop 2.X. 
Building a Spark app is intentionally decoupled from the process of submitting 
an app to a cluster. We want users to be able to build Spark apps that they can 
run on e.g. different versions of Hadoop.

> Spark master doesn't compile against hadoop-common trunk
> --------------------------------------------------------
>
>                 Key: SPARK-1518
>                 URL: https://issues.apache.org/jira/browse/SPARK-1518
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Marcelo Vanzin
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>
> FSDataOutputStream::sync() has disappeared from trunk in Hadoop; 
> FileLogger.scala is calling it.
> I've changed it locally to hsync() so I can compile the code, but haven't 
> checked yet whether those are equivalent. hsync() seems to have been there 
> forever, so it hopefully works with all versions Spark cares about.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-1518) Spark master doesn't compile against hadoop-common trunk

Reply via email to