[ https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012520#comment-14012520 ]
Matei Zaharia commented on SPARK-1518: -------------------------------------- Sorry, I'm still not sure I understand what you're asking for -- maybe I missed it above. Are you worried that the Spark assembly on the cluster has to be pre-built against Hadoop? We could perhaps make it find stuff out of HADOOP_HOME, but then it wouldn't work for users that don't have a Hadoop installation, which is a lot of users. For client apps, it's really enough to add that hadoop-client dependency. No other manipulation is needed. If you want to build a client app that automatically works with multiple versions of Hadoop, you can also package it with Spark and hadoop-client marked as "provided" and use spark-submit to put the Spark assembly on your cluster in the classpath. Then it will work with whatever version that was built against. But you need to specify hadoop-client when you run without spark-submit if you want to talk to the version of HDFS in your cluster (e.g. you're testing the app on your laptop and trying to make it read from HDFS). > Spark master doesn't compile against hadoop-common trunk > -------------------------------------------------------- > > Key: SPARK-1518 > URL: https://issues.apache.org/jira/browse/SPARK-1518 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Marcelo Vanzin > Assignee: Colin Patrick McCabe > Priority: Critical > > FSDataOutputStream::sync() has disappeared from trunk in Hadoop; > FileLogger.scala is calling it. > I've changed it locally to hsync() so I can compile the code, but haven't > checked yet whether those are equivalent. hsync() seems to have been there > forever, so it hopefully works with all versions Spark cares about. -- This message was sent by Atlassian JIRA (v6.2#6252)