I'm going to -1 this given the number of small bug fixes that have gone into the release branch. I'll follow with another RC shortly.
On Tue, May 2, 2017 at 7:35 AM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > I won't +1 just given that it seems certain there will be another RC and > there are the outstanding ML QA blocker issues. > > But clean build and test for JVM and Python tests LGTM on CentOS Linux > 7.2.1511, OpenJDK 1.8.0_111 > > > On Mon, 1 May 2017 at 22:42 Frank Austin Nothaft <fnoth...@berkeley.edu> > wrote: > >> Hi Ryan, >> >> IMO, the problem is that the Spark Avro version conflicts with the >> Parquet Avro version. As discussed upthread, I don’t think there’s a way to >> *reliably *make sure that Avro 1.8 is on the classpath first while using >> spark-submit. Relocating avro in our project wouldn’t solve the problem, >> because the MethodNotFoundError is thrown from the internals of the >> ParquetAvroOutputFormat, not from code in our project. >> >> Regards, >> >> Frank Austin Nothaft >> fnoth...@berkeley.edu >> fnoth...@eecs.berkeley.edu >> 202-340-0466 <(202)%20340-0466> >> >> On May 1, 2017, at 12:33 PM, Ryan Blue <rb...@netflix.com> wrote: >> >> Michael, I think that the problem is with your classpath. >> >> Spark has a dependency to 1.7.7, which can't be changed. Your project is >> what pulls in parquet-avro and transitively Avro 1.8. Spark has no runtime >> dependency on Avro 1.8. It is understandably annoying that using the same >> version of Parquet for your parquet-avro dependency is what causes your >> project to depend on Avro 1.8, but Spark's dependencies aren't a problem >> because its Parquet dependency doesn't bring in Avro. >> >> There are a few ways around this: >> 1. Make sure Avro 1.8 is found in the classpath first >> 2. Shade Avro 1.8 in your project (assuming Avro classes aren't shared) >> 3. Use parquet-avro 1.8.1 in your project, which I think should work with >> 1.8.2 and avoid the Avro change >> >> The work-around in Spark is for tests, which do use parquet-avro. We can >> look at a Parquet 1.8.3 that avoids this issue, but I think this is >> reasonable for the 2.2.0 release. >> >> rb >> >> On Mon, May 1, 2017 at 12:08 PM, Michael Heuer <heue...@gmail.com> wrote: >> >>> Please excuse me if I'm misunderstanding -- the problem is not with our >>> library or our classpath. >>> >>> There is a conflict within Spark itself, in that Parquet 1.8.2 expects >>> to find Avro 1.8.0 on the runtime classpath and sees 1.7.7 instead. Spark >>> already has to work around this for unit tests to pass. >>> >>> >>> >>> On Mon, May 1, 2017 at 2:00 PM, Ryan Blue <rb...@netflix.com> wrote: >>> >>>> Thanks for the extra context, Frank. I agree that it sounds like your >>>> problem comes from the conflict between your Jars and what comes with >>>> Spark. Its the same concern that makes everyone shudder when anything has a >>>> public dependency on Jackson. :) >>>> >>>> What we usually do to get around situations like this is to relocate >>>> the problem library inside the shaded Jar. That way, Spark uses its version >>>> of Avro and your classes use a different version of Avro. This works if you >>>> don't need to share classes between the two. Would that work for your >>>> situation? >>>> >>>> rb >>>> >>>> On Mon, May 1, 2017 at 11:55 AM, Koert Kuipers <ko...@tresata.com> >>>> wrote: >>>> >>>>> sounds like you are running into the fact that you cannot really put >>>>> your classes before spark's on classpath? spark's switches to support this >>>>> never really worked for me either. >>>>> >>>>> inability to control the classpath + inconsistent jars => trouble ? >>>>> >>>>> On Mon, May 1, 2017 at 2:36 PM, Frank Austin Nothaft < >>>>> fnoth...@berkeley.edu> wrote: >>>>> >>>>>> Hi Ryan, >>>>>> >>>>>> We do set Avro to 1.8 in our downstream project. We also set Spark as >>>>>> a provided dependency, and build an überjar. We run via spark-submit, >>>>>> which >>>>>> builds the classpath with our überjar and all of the Spark deps. This >>>>>> leads >>>>>> to avro 1.7.1 getting picked off of the classpath at runtime, which >>>>>> causes >>>>>> the no such method exception to occur. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Frank Austin Nothaft >>>>>> fnoth...@berkeley.edu >>>>>> fnoth...@eecs.berkeley.edu >>>>>> 202-340-0466 <(202)%20340-0466> >>>>>> >>>>>> On May 1, 2017, at 11:31 AM, Ryan Blue <rb...@netflix.com> wrote: >>>>>> >>>>>> Frank, >>>>>> >>>>>> The issue you're running into is caused by using parquet-avro with >>>>>> Avro 1.7. Can't your downstream project set the Avro dependency to 1.8? >>>>>> Spark can't update Avro because it is a breaking change that would force >>>>>> users to rebuilt specific Avro classes in some cases. But you should be >>>>>> free to use Avro 1.8 to avoid the problem. >>>>>> >>>>>> On Mon, May 1, 2017 at 11:08 AM, Frank Austin Nothaft < >>>>>> fnoth...@berkeley.edu> wrote: >>>>>> >>>>>>> Hi Ryan et al, >>>>>>> >>>>>>> The issue we’ve seen using a build of the Spark 2.2.0 branch from a >>>>>>> downstream project is that parquet-avro uses one of the new Avro 1.8.0 >>>>>>> methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.7 as >>>>>>> a >>>>>>> dependency. My colleague Michael (who posted earlier on this thread) >>>>>>> documented this in Spark-19697 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-19697>. I know that >>>>>>> Spark has unit tests that check this compatibility issue, but it looks >>>>>>> like >>>>>>> there was a recent change that sets a test scope dependency on Avro >>>>>>> 1.8.0 >>>>>>> <https://github.com/apache/spark/commit/0077bfcb93832d93009f73f4b80f2e3d98fd2fa4>, >>>>>>> which masks this issue in the unit tests. With this error, you can’t use >>>>>>> the ParquetAvroOutputFormat from a application running on Spark 2.2.0. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Frank Austin Nothaft >>>>>>> fnoth...@berkeley.edu >>>>>>> fnoth...@eecs.berkeley.edu >>>>>>> 202-340-0466 <(202)%20340-0466> >>>>>>> >>>>>>> On May 1, 2017, at 10:02 AM, Ryan Blue <rb...@netflix.com.INVALID >>>>>>> <rb...@netflix.com.invalid>> wrote: >>>>>>> >>>>>>> I agree with Sean. Spark only pulls in parquet-avro for tests. For >>>>>>> execution, it implements the record materialization APIs in Parquet to >>>>>>> go >>>>>>> directly to Spark SQL rows. This doesn't actually leak an Avro 1.8 >>>>>>> dependency into Spark as far as I can tell. >>>>>>> >>>>>>> rb >>>>>>> >>>>>>> On Mon, May 1, 2017 at 8:34 AM, Sean Owen <so...@cloudera.com> >>>>>>> wrote: >>>>>>> >>>>>>>> See discussion at https://github.com/apache/spark/pull/17163 -- I >>>>>>>> think the issue is that fixing this trades one problem for a slightly >>>>>>>> bigger one. >>>>>>>> >>>>>>>> >>>>>>>> On Mon, May 1, 2017 at 4:13 PM Michael Heuer <heue...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Version 2.2.0 bumps the dependency version for parquet to 1.8.2 >>>>>>>>> but does not bump the dependency version for avro (currently at >>>>>>>>> 1.7.7). >>>>>>>>> Though perhaps not clear from the issue I reported [0], this means >>>>>>>>> that >>>>>>>>> Spark is internally inconsistent, in that a call through parquet >>>>>>>>> (which >>>>>>>>> depends on avro 1.8.0 [1]) may throw errors at runtime when it hits >>>>>>>>> avro >>>>>>>>> 1.7.7 on the classpath. Avro 1.8.0 is not binary compatible with >>>>>>>>> 1.7.7. >>>>>>>>> >>>>>>>>> [0] - https://issues.apache.org/jira/browse/SPARK-19697 >>>>>>>>> [1] - https://github.com/apache/parquet-mr/blob/apache- >>>>>>>>> parquet-1.8.2/pom.xml#L96 >>>>>>>>> >>>>>>>>> On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <so...@cloudera.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I have one more issue that, if it needs to be fixed, needs to be >>>>>>>>>> fixed for 2.2.0. >>>>>>>>>> >>>>>>>>>> I'm fixing build warnings for the release and noticed that >>>>>>>>>> checkstyle actually complains there are some Java methods named in >>>>>>>>>> TitleCase, like `ProcessingTimeTimeout`: >>>>>>>>>> >>>>>>>>>> https://github.com/apache/spark/pull/17803/files#r113934080 >>>>>>>>>> >>>>>>>>>> Easy enough to fix and it's right, that's not conventional. >>>>>>>>>> However I wonder if it was done on purpose to match a class name? >>>>>>>>>> >>>>>>>>>> I think this is one for @tdas >>>>>>>>>> >>>>>>>>>> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust < >>>>>>>>>> mich...@databricks.com> wrote: >>>>>>>>>> >>>>>>>>>>> Please vote on releasing the following candidate as Apache >>>>>>>>>>> Spark version 2.2.0. The vote is open until Tues, May 2nd, 2017 >>>>>>>>>>> at 12:00 PST and passes if a majority of at least 3 +1 PMC votes are >>>>>>>>>>> cast. >>>>>>>>>>> >>>>>>>>>>> [ ] +1 Release this package as Apache Spark 2.2.0 >>>>>>>>>>> [ ] -1 Do not release this package because ... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> To learn more about Apache Spark, please see >>>>>>>>>>> http://spark.apache.org/ >>>>>>>>>>> >>>>>>>>>>> The tag to be voted on is v2.2.0-rc1 >>>>>>>>>>> <https://github.com/apache/spark/tree/v2.2.0-rc1> ( >>>>>>>>>>> 8ccb4a57c82146c1a8f8966c7e64010cf5632cb6) >>>>>>>>>>> >>>>>>>>>>> List of JIRA tickets resolved can be found with this filter >>>>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> The release files, including signatures, digests, etc. can be >>>>>>>>>>> found at: >>>>>>>>>>> http://home.apache.org/~pwendell/spark-releases/spark- >>>>>>>>>>> 2.2.0-rc1-bin/ >>>>>>>>>>> >>>>>>>>>>> Release artifacts are signed with the following key: >>>>>>>>>>> https://people.apache.org/keys/committer/pwendell.asc >>>>>>>>>>> >>>>>>>>>>> The staging repository for this release can be found at: >>>>>>>>>>> https://repository.apache.org/content/repositories/ >>>>>>>>>>> orgapachespark-1235/ >>>>>>>>>>> >>>>>>>>>>> The documentation corresponding to this release can be found at: >>>>>>>>>>> http://people.apache.org/~pwendell/spark-releases/spark- >>>>>>>>>>> 2.2.0-rc1-docs/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *FAQ* >>>>>>>>>>> >>>>>>>>>>> *How can I help test this release?* >>>>>>>>>>> >>>>>>>>>>> If you are a Spark user, you can help us test this release by >>>>>>>>>>> taking an existing Spark workload and running on this release >>>>>>>>>>> candidate, >>>>>>>>>>> then reporting any regressions. >>>>>>>>>>> >>>>>>>>>>> *What should happen to JIRA tickets still targeting 2.2.0?* >>>>>>>>>>> >>>>>>>>>>> Committers should look at those and triage. Extremely important >>>>>>>>>>> bug fixes, documentation, and API tweaks that impact compatibility >>>>>>>>>>> should >>>>>>>>>>> be worked on immediately. Everything else please retarget to 2.3.0 >>>>>>>>>>> or 2.2.1. >>>>>>>>>>> >>>>>>>>>>> *But my bug isn't fixed!??!* >>>>>>>>>>> >>>>>>>>>>> In order to make timely releases, we will typically not hold the >>>>>>>>>>> release unless the bug in question is a regression from 2.1.1. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> Software Engineer >>>>>>> Netflix >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Software Engineer >>>>>> Netflix >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> >> >>