Awesome work Ismaël, let me know if I can help somewhere. Chees, Fokko
Op ma 17 feb. 2020 om 16:00 schreef Ismaël Mejía <[email protected]>: > I had not tried to do the upgrade Spark since I assumed it will fail > because of the transitive dependencies of Hive. > > But I decided to give it a shot today. Luckily the Spark code base is quite > Avro friendly so codewise it was 'easy'. > > Of course it is still failing, but you can use that to refer on the other > PR. > And if you can find any fixes to the pending things that would be great. > > https://github.com/apache/spark/pull/27609 > > Regards, > Ismaël > > > On Fri, Feb 14, 2020 at 5:42 PM Michael Heuer <[email protected]> wrote: > > > Hello Ismaël, > > > > Might you be able to share a link to your patch for Spark? I would like > > to try to apply it on top of > > > > https://github.com/apache/spark/pull/26804 < > > https://github.com/apache/spark/pull/26804> > > > > which attempts to upgrade the Parquet dependency for Spark to 1.11.0. > > > > Thank you, > > > > michael > > > > > > > On Feb 14, 2020, at 10:30 AM, Ismaël Mejía <[email protected]> wrote: > > > > > > Ah lovely question. > > > > > > tldr; version > > > Spark depends on Hive so Hive should be upgraded first > > > Spark depends on two versions of Hive a fork by Spark of 1.x and > upstream > > > Hive 2.x > > > Upgrading the first is not even discussed at the moment, for the > second I > > > added a patch that passes all tests if you run it against Spark > > 2.4/master, > > > but Hive uses a forked version of Spark 2.3 to run its tests (YES > > CIRCULAR > > > DEPENDENCY!!!) > > > > > > One extra point that is pushing things in the right direction is that > > > Parquet and Iceberg already moved to Avro 1.9.x so pressure is growing > > for > > > things to move, but it is still is a mess, but we want to give the > fight, > > > one thing is sure it won't be for Spark 3.0.0, best case 3.1.x and that > > > also depends on the good will of the Hive contributors that have > ignored > > my > > > emails + patches for some time. > > > > > > https://lists.apache.org/thread.html/rc6c672ad4a5e255957d54d80ff83bf48eacece2828a86bc6cedd9c4c%40%3Cdev.hive.apache.org%3E > > > > > > For the detailed details on the saga: > > > https://issues.apache.org/jira/browse/SPARK-27733 > > > https://issues.apache.org/jira/browse/HIVE-21737 > > > > > > > > > On Fri, Feb 14, 2020 at 5:04 PM Michael Heuer <[email protected]> > wrote: > > > > > >> Hello, > > >> > > >> I wonder if any Avro devs might be willing to help push a PR for > Apache > > >> Spark to update the Avro dependency from 1.8.2 to 1.9.2? > > >> > > >> I foresee some trouble with binary incompatible code changes and > > >> dependency version conflicts, and could use some additional support. > > >> > > >> Thank you in advance, > > >> > > >> michael > > > > >
