I ran into a weird bug today where trying to read a file from HDFS built using Hadoop 2 gives an error saying "No FileSystem for scheme: hdfs". Specifically this only seems to happen when building an assembly jar in the application and not when using sbt's run-main.
The project's setup[0] is pretty simple and is only a slight modification of the project used by the release audit tool. The sbt assembly instructions[1] are mostly copied from Spark's sbt build files. We run into this in SparkR as well, so it'll be great if anybody has an idea on how to debug this. To repoduce, you can do the following: 1. Launch a Spark EC2 cluster with 0.9.0 with --hadoop-major-version=2 2. Clone https://github.com/shivaram/spark-utils 3. Run release-audits/sbt_app_core/run-hdfs-test.sh Thanks Shivaram [0] https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/src/main/scala/SparkHdfsApp.scala [1] https://github.com/shivaram/spark-utils/blob/master/release-audits/sbt_app_core/build.sbt