That is true. Shall we move this forward by creating Jira's for dropping the modules? Then we can have further discussion on the tickets themselves.
For me, I would suggest following to be dropped: - parquet-hive-* - parquet-hadoop-bundle (shaded deps) - parquet-cascading - parquet-pig - parquet-scrooge - parquet-scala We need to decide on: - parquet-thrift - parquet-cascading3 I couldn't find any statistics on the number of downloads. Cheers, Fokko Op di 29 jan. 2019 om 17:58 schreef Ryan Blue <[email protected]>: > I don't think we need a major release to drop modules. Those modules will > just not be released. That fits with semantic versioning because the line > of versions stops. > > On Tue, Jan 29, 2019 at 3:37 AM Gabor Szadovszky <[email protected]> wrote: > > > Hi, > > > > I agree with Fokko. It would be nice to drop these modules but only in > the > > next major release. > > > > On Tue, Jan 29, 2019 at 11:57 AM Uwe L. Korn <[email protected]> wrote: > > > > > Hello Fokko, > > > > > > I have put up a PR for the Scala update > > > https://github.com/apache/parquet-mr/pull/605. parquet-scrooge fails > due > > > to a Thrift parsing error but parquet-scala succeeds with Scala 2.12 > With > > > dropping scrooge, we could at least move this forward. > > > > > > Uwe > > > > > > > Am 29.01.2019 um 11:40 schrieb Nandor Kollar > > > <[email protected]>: > > > > > > > > Removing parquet-hive-* is a great idea, the code in Parquet is not > > > > maintained any more, it is just a burden there. > > > > > > > > As of parquet-pig, I'd prefer moving it to Pig (if Pig community > > accepts > > > it > > > > as it is) instead of dropping it or moving to a separate project. I > > know > > > > people who still use Pig with Parquet. > > > > > > > > Regards, > > > > Nandor > > > > > > > >> On Mon, Jan 28, 2019 at 6:29 PM Ryan Blue <[email protected] > > > > > wrote: > > > >> > > > >> Hi everyone, > > > >> > > > >> I’m working on the 1.10.1 build and I’ve noticed that we will have > > > several > > > >> modules that are not maintained or are very old. This includes all > of > > > the > > > >> Hive modules that moved into Hive years ago and also modules like > > > >> parquet-scrooge and parquet-scala that are based on Scala 2.10 that > > has > > > >> been EOL for years. > > > >> > > > >> We also have 2 command-line utilities, parquet-tools and > parquet-cli. > > > The > > > >> parquet-cli version is friendlier to use, but I’m clearly biased. In > > any > > > >> case, I don’t think we need to maintain both and it is confusing for > > > users > > > >> to have two modules that do the same thing. > > > >> > > > >> I propose we remove the following modules: > > > >> > > > >> - parquet-hive-* > > > >> - parquet-scrooge > > > >> - parquet-scala > > > >> - parquet-tools > > > >> - parquet-hadoop-bundle (shaded deps) > > > >> - > > > >> > > > >> parquet-cascading (in favor of parquet-cascading3, if we keep it) > > > >> There are also modules that I’m not sure about. Does anyone use > > these? > > > >> - > > > >> > > > >> parquet-thrift > > > >> - parquet-pig > > > >> - parquet-cascading3 > > > >> > > > >> Pig hasn’t had an update (other than project-wide changes) since Oct > > > 2017. > > > >> I think it may be time to drop support in Pig and allow that to > exist > > > as a > > > >> separate project if anyone is still interested in it. > > > >> > > > >> In the last few years, we’ve moved more to a model where processing > > > >> frameworks and engines maintain their own integration. Spark, > Presto, > > > >> Iceberg, and Hive fall into this category. So I would prefer to drop > > Pig > > > >> and Cascading3. I’m fine keeping thrift if people think it is > useful. > > > >> > > > >> Thoughts? > > > >> > > > >> rb > > > >> -- > > > >> Ryan Blue > > > >> Software Engineer > > > >> Netflix > > > >> > > > > > > > > -- > Ryan Blue > Software Engineer > Netflix >
