Re: 1), I think we tried to fix that on the build side and it requires flags that not all tar versions (i.e. OS X) have. But that's tangential.
I think the Avro + Parquet dependency situation is generally problematic -- see JIRA for some details. But yes I'm not surprised if Spark has a different version from Hadoop 2.7.x and that would cause problems -- if using Avro. I'm not sure the mistake is that the JARs are missing, as I think this is supposed to be a 'provided' dependency, but I haven't looked into it. If there's any easy obvious correction to be made there, by all means. Not sure what the deal is with jline... I'd expect that's in the "hadoop-provided" distro? That one may be a real issue if it's considered provided but isn't used that way. On Mon, May 20, 2019 at 4:15 PM Koert Kuipers <ko...@tresata.com> wrote: > > we run it without issues on hadoop 2.6 - 2.8 on top of my head. > > we however do some post-processing on the tarball: > 1) we fix the ownership of the files inside the tar.gz file (should be > uid/gid 0/0, otherwise untarring by root can lead to ownership by unknown > user). > 2) add avro-1.8.2.jar and jline-2.14.6.jar to jars folder. i believe these > jars missing in provided profile is simply a mistake. > > best, > koert > > On Mon, May 20, 2019 at 3:37 PM Michael Heuer <heue...@gmail.com> wrote: >> >> Hello, >> >> Which Hadoop version or versions are compatible with Spark 2.4.3 and Scala >> 2.12? >> >> The binary distribution spark-2.4.3-bin-without-hadoop-scala-2.12.tgz is >> missing avro-1.8.2.jar, so when attempting to run with Hadoop 2.7.7 there >> are classpath conflicts at runtime, as Hadoop 2.7.7 includes avro-1.7.4.jar. >> >> https://issues.apache.org/jira/browse/SPARK-27781 >> >> michael --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org