[
https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012301#comment-14012301
]
Sean Owen commented on SPARK-1518:
----------------------------------
Yes Matei that's what I'm getting at. Spark is a client of Hadoop, so if I use
Spark, and Spark uses Hadoop, then I have to match the Hadoop that Spark uses
to the cluster. It's not just if my app uses HDFS directly. I can manually
override hadoop-client, although, I'd have to reproduce a lot of the
dependency-graph manipulation in Spark's build to make it work.
In Sandy's blog post example he's just running the code on the cluster and
pointing at the matched Spark/Hadoop jars already there. That's also a solution
that will work for a lot of use cases. I accept that the use case I have in
mind, which is adding Spark to a larger stand-alone app, is not everyone's use
case, although it's not crazy. It doesn't work out if instead the Spark/Hadoop
jars are packaged together into an assembly and run that way.
I agree overriding the Hadoop dependency is a solution, and accept that Spark
shouldn't necessarily bend over backwards for these Hadoop issues, but this
does go back to your point about accessibility. Right now I think anyone that
wants to do what I'm doing for any Hadoop 2 app, and doesn't want to make a
custom build or manually override dependencies, will just point at Cloudera's
"0.9.0-cdh5.0.1" even if not using CDH. That felt funny.
Apologies if I have somehow totally missed something. I've talked too much,
thanks for hearing out the use case. Maybe best to see if this is actually an
issue anyone shares.
> Spark master doesn't compile against hadoop-common trunk
> --------------------------------------------------------
>
> Key: SPARK-1518
> URL: https://issues.apache.org/jira/browse/SPARK-1518
> Project: Spark
> Issue Type: Bug
> Reporter: Marcelo Vanzin
> Assignee: Colin Patrick McCabe
> Priority: Critical
>
> FSDataOutputStream::sync() has disappeared from trunk in Hadoop;
> FileLogger.scala is calling it.
> I've changed it locally to hsync() so I can compile the code, but haven't
> checked yet whether those are equivalent. hsync() seems to have been there
> forever, so it hopefully works with all versions Spark cares about.
--
This message was sent by Atlassian JIRA
(v6.2#6252)