Hi Ryan!

I think relocating the avro dependency inside of Spark would make a lot of 
sense. Otherwise, we’d need Spark to move to Avro 1.8.0, or Parquet to cut a 
new 1.8.3 release that either reverts back to Avro 1.7.7 or that eliminates the 
code that is binary incompatible between Avro 1.7.7 and 1.8.0.

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

> On May 1, 2017, at 12:00 PM, Ryan Blue <rb...@netflix.com> wrote:
> 
> Thanks for the extra context, Frank. I agree that it sounds like your problem 
> comes from the conflict between your Jars and what comes with Spark. Its the 
> same concern that makes everyone shudder when anything has a public 
> dependency on Jackson. :)
> 
> What we usually do to get around situations like this is to relocate the 
> problem library inside the shaded Jar. That way, Spark uses its version of 
> Avro and your classes use a different version of Avro. This works if you 
> don't need to share classes between the two. Would that work for your 
> situation?
> 
> rb
> 
> On Mon, May 1, 2017 at 11:55 AM, Koert Kuipers <ko...@tresata.com 
> <mailto:ko...@tresata.com>> wrote:
> sounds like you are running into the fact that you cannot really put your 
> classes before spark's on classpath? spark's switches to support this never 
> really worked for me either.
> 
> inability to control the classpath + inconsistent jars => trouble ?
> 
> On Mon, May 1, 2017 at 2:36 PM, Frank Austin Nothaft <fnoth...@berkeley.edu 
> <mailto:fnoth...@berkeley.edu>> wrote:
> Hi Ryan,
> 
> We do set Avro to 1.8 in our downstream project. We also set Spark as a 
> provided dependency, and build an überjar. We run via spark-submit, which 
> builds the classpath with our überjar and all of the Spark deps. This leads 
> to avro 1.7.1 getting picked off of the classpath at runtime, which causes 
> the no such method exception to occur.
> 
> Regards,
> 
> Frank Austin Nothaft
> fnoth...@berkeley.edu <mailto:fnoth...@berkeley.edu>
> fnoth...@eecs.berkeley.edu <mailto:fnoth...@eecs.berkeley.edu>
> 202-340-0466 <tel:(202)%20340-0466>
>> On May 1, 2017, at 11:31 AM, Ryan Blue <rb...@netflix.com 
>> <mailto:rb...@netflix.com>> wrote:
>> 
>> Frank,
>> 
>> The issue you're running into is caused by using parquet-avro with Avro 1.7. 
>> Can't your downstream project set the Avro dependency to 1.8? Spark can't 
>> update Avro because it is a breaking change that would force users to 
>> rebuilt specific Avro classes in some cases. But you should be free to use 
>> Avro 1.8 to avoid the problem.
>> 
>> On Mon, May 1, 2017 at 11:08 AM, Frank Austin Nothaft <fnoth...@berkeley.edu 
>> <mailto:fnoth...@berkeley.edu>> wrote:
>> Hi Ryan et al,
>> 
>> The issue we’ve seen using a build of the Spark 2.2.0 branch from a 
>> downstream project is that parquet-avro uses one of the new Avro 1.8.0 
>> methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.7 as a 
>> dependency. My colleague Michael (who posted earlier on this thread) 
>> documented this in Spark-19697 
>> <https://issues.apache.org/jira/browse/SPARK-19697>. I know that Spark has 
>> unit tests that check this compatibility issue, but it looks like there was 
>> a recent change that sets a test scope dependency on Avro 1.8.0 
>> <https://github.com/apache/spark/commit/0077bfcb93832d93009f73f4b80f2e3d98fd2fa4>,
>>  which masks this issue in the unit tests. With this error, you can’t use 
>> the ParquetAvroOutputFormat from a application running on Spark 2.2.0.
>> 
>> Regards,
>> 
>> Frank Austin Nothaft
>> fnoth...@berkeley.edu <mailto:fnoth...@berkeley.edu>
>> fnoth...@eecs.berkeley.edu <mailto:fnoth...@eecs.berkeley.edu>
>> 202-340-0466 <tel:(202)%20340-0466>
>> 
>>> On May 1, 2017, at 10:02 AM, Ryan Blue <rb...@netflix.com.INVALID 
>>> <mailto:rb...@netflix.com.invalid>> wrote:
>>> 
>>> I agree with Sean. Spark only pulls in parquet-avro for tests. For 
>>> execution, it implements the record materialization APIs in Parquet to go 
>>> directly to Spark SQL rows. This doesn't actually leak an Avro 1.8 
>>> dependency into Spark as far as I can tell.
>>> 
>>> rb
>>> 
>>> On Mon, May 1, 2017 at 8:34 AM, Sean Owen <so...@cloudera.com 
>>> <mailto:so...@cloudera.com>> wrote:
>>> See discussion at https://github.com/apache/spark/pull/17163 
>>> <https://github.com/apache/spark/pull/17163> -- I think the issue is that 
>>> fixing this trades one problem for a slightly bigger one.
>>> 
>>> 
>>> On Mon, May 1, 2017 at 4:13 PM Michael Heuer <heue...@gmail.com 
>>> <mailto:heue...@gmail.com>> wrote:
>>> Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but does 
>>> not bump the dependency version for avro (currently at 1.7.7).  Though 
>>> perhaps not clear from the issue I reported [0], this means that Spark is 
>>> internally inconsistent, in that a call through parquet (which depends on 
>>> avro 1.8.0 [1]) may throw errors at runtime when it hits avro 1.7.7 on the 
>>> classpath.  Avro 1.8.0 is not binary compatible with 1.7.7.
>>> 
>>> [0] - https://issues.apache.org/jira/browse/SPARK-19697 
>>> <https://issues.apache.org/jira/browse/SPARK-19697>
>>> [1] - 
>>> https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96 
>>> <https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96>
>>> 
>>> On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <so...@cloudera.com 
>>> <mailto:so...@cloudera.com>> wrote:
>>> I have one more issue that, if it needs to be fixed, needs to be fixed for 
>>> 2.2.0.
>>> 
>>> I'm fixing build warnings for the release and noticed that checkstyle 
>>> actually complains there are some Java methods named in TitleCase, like 
>>> `ProcessingTimeTimeout`:
>>> 
>>> https://github.com/apache/spark/pull/17803/files#r113934080 
>>> <https://github.com/apache/spark/pull/17803/files#r113934080>
>>> 
>>> Easy enough to fix and it's right, that's not conventional. However I 
>>> wonder if it was done on purpose to match a class name?
>>> 
>>> I think this is one for @tdas
>>> 
>>> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <mich...@databricks.com 
>>> <mailto:mich...@databricks.com>> wrote:
>>> Please vote on releasing the following candidate as Apache Spark version 
>>> 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes 
>>> if a majority of at least 3 +1 PMC votes are cast.
>>> 
>>> [ ] +1 Release this package as Apache Spark 2.2.0
>>> [ ] -1 Do not release this package because ...
>>> 
>>> 
>>> To learn more about Apache Spark, please see http://spark.apache.org/ 
>>> <http://spark.apache.org/>
>>> 
>>> The tag to be voted on is v2.2.0-rc1 
>>> <https://github.com/apache/spark/tree/v2.2.0-rc1> 
>>> (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)
>>> 
>>> List of JIRA tickets resolved can be found with this filter 
>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1>.
>>> 
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/ 
>>> <http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/>
>>> 
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc 
>>> <https://people.apache.org/keys/committer/pwendell.asc>
>>> 
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1235/ 
>>> <https://repository.apache.org/content/repositories/orgapachespark-1235/>
>>> 
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/ 
>>> <http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/>
>>> 
>>> 
>>> FAQ
>>> 
>>> How can I help test this release?
>>> 
>>> If you are a Spark user, you can help us test this release by taking an 
>>> existing Spark workload and running on this release candidate, then 
>>> reporting any regressions.
>>> 
>>> What should happen to JIRA tickets still targeting 2.2.0?
>>> 
>>> Committers should look at those and triage. Extremely important bug fixes, 
>>> documentation, and API tweaks that impact compatibility should be worked on 
>>> immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>> 
>>> But my bug isn't fixed!??!
>>> 
>>> In order to make timely releases, we will typically not hold the release 
>>> unless the bug in question is a regression from 2.1.1.
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>> 
>> 
>> 
>> 
>> -- 
>> Ryan Blue
>> Software Engineer
>> Netflix
> 
> 
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Reply via email to