Hi Ryan! I think relocating the avro dependency inside of Spark would make a lot of sense. Otherwise, we’d need Spark to move to Avro 1.8.0, or Parquet to cut a new 1.8.3 release that either reverts back to Avro 1.7.7 or that eliminates the code that is binary incompatible between Avro 1.7.7 and 1.8.0.
Regards, Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 > On May 1, 2017, at 12:00 PM, Ryan Blue <rb...@netflix.com> wrote: > > Thanks for the extra context, Frank. I agree that it sounds like your problem > comes from the conflict between your Jars and what comes with Spark. Its the > same concern that makes everyone shudder when anything has a public > dependency on Jackson. :) > > What we usually do to get around situations like this is to relocate the > problem library inside the shaded Jar. That way, Spark uses its version of > Avro and your classes use a different version of Avro. This works if you > don't need to share classes between the two. Would that work for your > situation? > > rb > > On Mon, May 1, 2017 at 11:55 AM, Koert Kuipers <ko...@tresata.com > <mailto:ko...@tresata.com>> wrote: > sounds like you are running into the fact that you cannot really put your > classes before spark's on classpath? spark's switches to support this never > really worked for me either. > > inability to control the classpath + inconsistent jars => trouble ? > > On Mon, May 1, 2017 at 2:36 PM, Frank Austin Nothaft <fnoth...@berkeley.edu > <mailto:fnoth...@berkeley.edu>> wrote: > Hi Ryan, > > We do set Avro to 1.8 in our downstream project. We also set Spark as a > provided dependency, and build an überjar. We run via spark-submit, which > builds the classpath with our überjar and all of the Spark deps. This leads > to avro 1.7.1 getting picked off of the classpath at runtime, which causes > the no such method exception to occur. > > Regards, > > Frank Austin Nothaft > fnoth...@berkeley.edu <mailto:fnoth...@berkeley.edu> > fnoth...@eecs.berkeley.edu <mailto:fnoth...@eecs.berkeley.edu> > 202-340-0466 <tel:(202)%20340-0466> >> On May 1, 2017, at 11:31 AM, Ryan Blue <rb...@netflix.com >> <mailto:rb...@netflix.com>> wrote: >> >> Frank, >> >> The issue you're running into is caused by using parquet-avro with Avro 1.7. >> Can't your downstream project set the Avro dependency to 1.8? Spark can't >> update Avro because it is a breaking change that would force users to >> rebuilt specific Avro classes in some cases. But you should be free to use >> Avro 1.8 to avoid the problem. >> >> On Mon, May 1, 2017 at 11:08 AM, Frank Austin Nothaft <fnoth...@berkeley.edu >> <mailto:fnoth...@berkeley.edu>> wrote: >> Hi Ryan et al, >> >> The issue we’ve seen using a build of the Spark 2.2.0 branch from a >> downstream project is that parquet-avro uses one of the new Avro 1.8.0 >> methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.7 as a >> dependency. My colleague Michael (who posted earlier on this thread) >> documented this in Spark-19697 >> <https://issues.apache.org/jira/browse/SPARK-19697>. I know that Spark has >> unit tests that check this compatibility issue, but it looks like there was >> a recent change that sets a test scope dependency on Avro 1.8.0 >> <https://github.com/apache/spark/commit/0077bfcb93832d93009f73f4b80f2e3d98fd2fa4>, >> which masks this issue in the unit tests. With this error, you can’t use >> the ParquetAvroOutputFormat from a application running on Spark 2.2.0. >> >> Regards, >> >> Frank Austin Nothaft >> fnoth...@berkeley.edu <mailto:fnoth...@berkeley.edu> >> fnoth...@eecs.berkeley.edu <mailto:fnoth...@eecs.berkeley.edu> >> 202-340-0466 <tel:(202)%20340-0466> >> >>> On May 1, 2017, at 10:02 AM, Ryan Blue <rb...@netflix.com.INVALID >>> <mailto:rb...@netflix.com.invalid>> wrote: >>> >>> I agree with Sean. Spark only pulls in parquet-avro for tests. For >>> execution, it implements the record materialization APIs in Parquet to go >>> directly to Spark SQL rows. This doesn't actually leak an Avro 1.8 >>> dependency into Spark as far as I can tell. >>> >>> rb >>> >>> On Mon, May 1, 2017 at 8:34 AM, Sean Owen <so...@cloudera.com >>> <mailto:so...@cloudera.com>> wrote: >>> See discussion at https://github.com/apache/spark/pull/17163 >>> <https://github.com/apache/spark/pull/17163> -- I think the issue is that >>> fixing this trades one problem for a slightly bigger one. >>> >>> >>> On Mon, May 1, 2017 at 4:13 PM Michael Heuer <heue...@gmail.com >>> <mailto:heue...@gmail.com>> wrote: >>> Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but does >>> not bump the dependency version for avro (currently at 1.7.7). Though >>> perhaps not clear from the issue I reported [0], this means that Spark is >>> internally inconsistent, in that a call through parquet (which depends on >>> avro 1.8.0 [1]) may throw errors at runtime when it hits avro 1.7.7 on the >>> classpath. Avro 1.8.0 is not binary compatible with 1.7.7. >>> >>> [0] - https://issues.apache.org/jira/browse/SPARK-19697 >>> <https://issues.apache.org/jira/browse/SPARK-19697> >>> [1] - >>> https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96 >>> <https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96> >>> >>> On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <so...@cloudera.com >>> <mailto:so...@cloudera.com>> wrote: >>> I have one more issue that, if it needs to be fixed, needs to be fixed for >>> 2.2.0. >>> >>> I'm fixing build warnings for the release and noticed that checkstyle >>> actually complains there are some Java methods named in TitleCase, like >>> `ProcessingTimeTimeout`: >>> >>> https://github.com/apache/spark/pull/17803/files#r113934080 >>> <https://github.com/apache/spark/pull/17803/files#r113934080> >>> >>> Easy enough to fix and it's right, that's not conventional. However I >>> wonder if it was done on purpose to match a class name? >>> >>> I think this is one for @tdas >>> >>> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <mich...@databricks.com >>> <mailto:mich...@databricks.com>> wrote: >>> Please vote on releasing the following candidate as Apache Spark version >>> 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes >>> if a majority of at least 3 +1 PMC votes are cast. >>> >>> [ ] +1 Release this package as Apache Spark 2.2.0 >>> [ ] -1 Do not release this package because ... >>> >>> >>> To learn more about Apache Spark, please see http://spark.apache.org/ >>> <http://spark.apache.org/> >>> >>> The tag to be voted on is v2.2.0-rc1 >>> <https://github.com/apache/spark/tree/v2.2.0-rc1> >>> (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6) >>> >>> List of JIRA tickets resolved can be found with this filter >>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1>. >>> >>> The release files, including signatures, digests, etc. can be found at: >>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/ >>> <http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/> >>> >>> Release artifacts are signed with the following key: >>> https://people.apache.org/keys/committer/pwendell.asc >>> <https://people.apache.org/keys/committer/pwendell.asc> >>> >>> The staging repository for this release can be found at: >>> https://repository.apache.org/content/repositories/orgapachespark-1235/ >>> <https://repository.apache.org/content/repositories/orgapachespark-1235/> >>> >>> The documentation corresponding to this release can be found at: >>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/ >>> <http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/> >>> >>> >>> FAQ >>> >>> How can I help test this release? >>> >>> If you are a Spark user, you can help us test this release by taking an >>> existing Spark workload and running on this release candidate, then >>> reporting any regressions. >>> >>> What should happen to JIRA tickets still targeting 2.2.0? >>> >>> Committers should look at those and triage. Extremely important bug fixes, >>> documentation, and API tweaks that impact compatibility should be worked on >>> immediately. Everything else please retarget to 2.3.0 or 2.2.1. >>> >>> But my bug isn't fixed!??! >>> >>> In order to make timely releases, we will typically not hold the release >>> unless the bug in question is a regression from 2.1.1. >>> >>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >> >> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix > > > > > > -- > Ryan Blue > Software Engineer > Netflix