[
https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012651#comment-14012651
]
Matei Zaharia commented on SPARK-1518:
--------------------------------------
Okay, got it. But this only applies to you running the job on your laptop,
right? Because otherwise you'll get the right Hadoop via the installation on
the cluster.
For this use case I still think it's fine to require use of hadoop-client. It's
been like that for the past 2 releases and nobody has asked questions about it.
It's just one more entry to add to your pom.xml.
The concrete problem is that Hadoop has been extremely fickle with
compatibility even within a major release series (1.x or 2.x). HDFS protocol
versions change and you can't access the cluster, YARN versions change, etc. I
don't think there's a single release I'd call "Hadoop 2", and it would be
confusing to users to link to the "Hadoop 2" artifact and not have it run on
their cluster.
> Spark master doesn't compile against hadoop-common trunk
> --------------------------------------------------------
>
> Key: SPARK-1518
> URL: https://issues.apache.org/jira/browse/SPARK-1518
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Reporter: Marcelo Vanzin
> Assignee: Colin Patrick McCabe
> Priority: Critical
>
> FSDataOutputStream::sync() has disappeared from trunk in Hadoop;
> FileLogger.scala is calling it.
> I've changed it locally to hsync() so I can compile the code, but haven't
> checked yet whether those are equivalent. hsync() seems to have been there
> forever, so it hopefully works with all versions Spark cares about.
--
This message was sent by Atlassian JIRA
(v6.2#6252)