[
https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011851#comment-14011851
]
Patrick Wendell commented on SPARK-1518:
----------------------------------------
bq. In practice it look like one generic Hadoop 1, Hadoop 2, and CDH 4 release
is produced, and 1 set of Maven artifact. (PS again I am not sure Spark should
contain a CDH-specific distribution? realizing it's really a proxy for a
particular Hadoop combo. Same goes for a MapR profile, which is really for
vendors to maintain) That means right now you can't build a Spark app for
anything but Hadoop 1.x with Maven, without installing it yourself, and there's
not an official distro for anything but two major Hadoop versions. Support for
niche versions isn't really there or promised anyway, and fleshing out
"support" may make doing so pretty burdensome.
We need to update the list of binary builds for Spark... some are getting
outdated. The workflow for people building Spark apps is that they write their
app against the Spark API's in Maven central (they can do this no matter which
cluster they want to run on). To run the app, If they just want to run it
locally they can spark-submit from any compiled package of Spark, or they can
use their build tool to just run it. If they want to submit it to a cluster,
users need to have a Spark package compiled for the Hadoop version on the
cluster. Because of this we distribute pre-compiled builds to allow people to
avoid ever having to compile Spark.
In terms of vendor-specific builds, we've done this because users asked for it.
It's useful if, e.g. a user wants to submit a Spark job to a CDH or MapR
cluster. Or run spark-shell locally and read data from a CDH HDFS cluster.
That's the main use case we want to support.
I don't know what it means that you "can't build a Spark app" for Hadoop 2.X.
Building a Spark app is intentionally decoupled from the process of submitting
an app to a cluster. We want users to be able to build Spark apps that they can
run on e.g. different versions of Hadoop.
> Spark master doesn't compile against hadoop-common trunk
> --------------------------------------------------------
>
> Key: SPARK-1518
> URL: https://issues.apache.org/jira/browse/SPARK-1518
> Project: Spark
> Issue Type: Bug
> Reporter: Marcelo Vanzin
> Assignee: Colin Patrick McCabe
> Priority: Critical
>
> FSDataOutputStream::sync() has disappeared from trunk in Hadoop;
> FileLogger.scala is calling it.
> I've changed it locally to hsync() so I can compile the code, but haven't
> checked yet whether those are equivalent. hsync() seems to have been there
> forever, so it hopefully works with all versions Spark cares about.
--
This message was sent by Atlassian JIRA
(v6.2#6252)