First, thanks for the efforts and contribution to such a useful software stack! Spark is great!
I have been using the git tags for v1.2.0-rc1 and v1.2.0-rc2 built as follows: ./make-distribution.sh -Dhadoop.version=2.5.0-cdh5.2.0 > -Dyarn.version=2.5.0-cdh5.2.0 -Phadoop-2.4 -Phive -Pyarn -Phive-thriftserver I have been starting the thriftserver as follows: HADOOP_CONF_DIR=/etc/hadoop/conf ./sbin/start-thriftserver.sh --master yarn > --num-executors 16 Under v1.2.0-rc1 and v1.2.0-rc2, this has worked properly, where the thriftserver starts up and I am able to interact with it and execute queries as expected using the JDBC driver. I have updated to git tag v1.2.0, built identically and started the thriftserver identically, but am now running into the following issue on startup: Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: > hdfs://myhdfs/user/user/.sparkStaging/application_1416150945509_0055/datanucleus-api-jdo-3.2.6.jar, > expected: file:/// > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) > at > org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80) > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:519) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) > at > org.apache.spark.deploy.yarn.ClientDistributedCacheManager.addResource(ClientDistributedCacheManager.scala:67) > at > org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$5.apply(ClientBase.scala:257) > at > org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$5.apply(ClientBase.scala:242) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.deploy.yarn.ClientBase$class.prepareLocalResources(ClientBase.scala:242) > at > org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:35) > at > org.apache.spark.deploy.yarn.ClientBase$class.createContainerLaunchContext(ClientBase.scala:350) > at > org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:35) > at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:80) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:140) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:335) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:38) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:56) > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Looking at SPARK-4757, it appears others were seeing this behavior in earlier releases and it is fixed in v1.2.0, whereas I did not see the behavior in earlier releases and now am seeing it in v1.2.0. I have tested this with the exact same build/launch commands on two separate CDH5.2.0 clusters with identical results. Both machines where the build and execution take place have a proper HDFS/YARN client configuration in /etc/hadoop/conf and other hadoop tools like MR2 on YARN function as expected. Any ideas on what to do to resolve this issue? Thanks! -matt