I had downloaded the pre build package labeled "Spark 2.1.1 prebuilt with Hadoop 2.7 or later" from the direct download link on spark.apache.org.
However, I am seeing compatibility errors running against a deployed HDFS 2.7.3. (See my earlier message about Flume DStream producing 0 records after HDFS node restarted) I have been digging into this issue and have started to suspect versions mismatch between Hadoop server and client. I decided to look at Spark 2.1.1's pom.xml. It states hadoop,version as 2.2.0. There seems to be some mismtach here that I am not sure if that's the root cause of the issues I have been seeing. Can someone please confirm if the package mentioned above was indeed compiled with Hadoop 2.7? Or should I fall back on an HDFS Server 2.2 instead? Thanks N B