Hello Fokko, I have put up a PR for the Scala update https://github.com/apache/parquet-mr/pull/605. parquet-scrooge fails due to a Thrift parsing error but parquet-scala succeeds with Scala 2.12 With dropping scrooge, we could at least move this forward.
Uwe > Am 29.01.2019 um 11:40 schrieb Nandor Kollar <[email protected]>: > > Removing parquet-hive-* is a great idea, the code in Parquet is not > maintained any more, it is just a burden there. > > As of parquet-pig, I'd prefer moving it to Pig (if Pig community accepts it > as it is) instead of dropping it or moving to a separate project. I know > people who still use Pig with Parquet. > > Regards, > Nandor > >> On Mon, Jan 28, 2019 at 6:29 PM Ryan Blue <[email protected]> wrote: >> >> Hi everyone, >> >> I’m working on the 1.10.1 build and I’ve noticed that we will have several >> modules that are not maintained or are very old. This includes all of the >> Hive modules that moved into Hive years ago and also modules like >> parquet-scrooge and parquet-scala that are based on Scala 2.10 that has >> been EOL for years. >> >> We also have 2 command-line utilities, parquet-tools and parquet-cli. The >> parquet-cli version is friendlier to use, but I’m clearly biased. In any >> case, I don’t think we need to maintain both and it is confusing for users >> to have two modules that do the same thing. >> >> I propose we remove the following modules: >> >> - parquet-hive-* >> - parquet-scrooge >> - parquet-scala >> - parquet-tools >> - parquet-hadoop-bundle (shaded deps) >> - >> >> parquet-cascading (in favor of parquet-cascading3, if we keep it) >> There are also modules that I’m not sure about. Does anyone use these? >> - >> >> parquet-thrift >> - parquet-pig >> - parquet-cascading3 >> >> Pig hasn’t had an update (other than project-wide changes) since Oct 2017. >> I think it may be time to drop support in Pig and allow that to exist as a >> separate project if anyone is still interested in it. >> >> In the last few years, we’ve moved more to a model where processing >> frameworks and engines maintain their own integration. Spark, Presto, >> Iceberg, and Hive fall into this category. So I would prefer to drop Pig >> and Cascading3. I’m fine keeping thrift if people think it is useful. >> >> Thoughts? >> >> rb >> -- >> Ryan Blue >> Software Engineer >> Netflix >>
