[
https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012520#comment-14012520
]
Matei Zaharia commented on SPARK-1518:
--------------------------------------
Sorry, I'm still not sure I understand what you're asking for -- maybe I missed
it above. Are you worried that the Spark assembly on the cluster has to be
pre-built against Hadoop? We could perhaps make it find stuff out of
HADOOP_HOME, but then it wouldn't work for users that don't have a Hadoop
installation, which is a lot of users. For client apps, it's really enough to
add that hadoop-client dependency. No other manipulation is needed.
If you want to build a client app that automatically works with multiple
versions of Hadoop, you can also package it with Spark and hadoop-client marked
as "provided" and use spark-submit to put the Spark assembly on your cluster in
the classpath. Then it will work with whatever version that was built against.
But you need to specify hadoop-client when you run without spark-submit if you
want to talk to the version of HDFS in your cluster (e.g. you're testing the
app on your laptop and trying to make it read from HDFS).
> Spark master doesn't compile against hadoop-common trunk
> --------------------------------------------------------
>
> Key: SPARK-1518
> URL: https://issues.apache.org/jira/browse/SPARK-1518
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Reporter: Marcelo Vanzin
> Assignee: Colin Patrick McCabe
> Priority: Critical
>
> FSDataOutputStream::sync() has disappeared from trunk in Hadoop;
> FileLogger.scala is calling it.
> I've changed it locally to hsync() so I can compile the code, but haven't
> checked yet whether those are equivalent. hsync() seems to have been there
> forever, so it hopefully works with all versions Spark cares about.
--
This message was sent by Atlassian JIRA
(v6.2#6252)