That is true. Shall we move this forward by creating Jira's for dropping
the modules? Then we can have further discussion on the tickets themselves.

For me, I would suggest following to be dropped:
- parquet-hive-*
- parquet-hadoop-bundle (shaded deps)
- parquet-cascading
- parquet-pig
- parquet-scrooge
- parquet-scala

We need to decide on:
- parquet-thrift
- parquet-cascading3

I couldn't find any statistics on the number of downloads.

Cheers, Fokko

Op di 29 jan. 2019 om 17:58 schreef Ryan Blue <[email protected]>:

> I don't think we need a major release to drop modules. Those modules will
> just not be released. That fits with semantic versioning because the line
> of versions stops.
>
> On Tue, Jan 29, 2019 at 3:37 AM Gabor Szadovszky <[email protected]> wrote:
>
> > Hi,
> >
> > I agree with Fokko. It would be nice to drop these modules but only in
> the
> > next major release.
> >
> > On Tue, Jan 29, 2019 at 11:57 AM Uwe L. Korn <[email protected]> wrote:
> >
> > > Hello Fokko,
> > >
> > > I have put up a PR for the Scala update
> > > https://github.com/apache/parquet-mr/pull/605. parquet-scrooge fails
> due
> > > to a Thrift parsing error but parquet-scala succeeds with Scala 2.12
> With
> > > dropping scrooge, we could at least move this forward.
> > >
> > > Uwe
> > >
> > > > Am 29.01.2019 um 11:40 schrieb Nandor Kollar
> > > <[email protected]>:
> > > >
> > > > Removing parquet-hive-* is a great idea, the code in Parquet is not
> > > > maintained any more, it is just a burden there.
> > > >
> > > > As of parquet-pig, I'd prefer moving it to Pig (if Pig community
> > accepts
> > > it
> > > > as it is) instead of dropping it or moving to a separate project. I
> > know
> > > > people who still use Pig with Parquet.
> > > >
> > > > Regards,
> > > > Nandor
> > > >
> > > >> On Mon, Jan 28, 2019 at 6:29 PM Ryan Blue <[email protected]
> >
> > > wrote:
> > > >>
> > > >> Hi everyone,
> > > >>
> > > >> I’m working on the 1.10.1 build and I’ve noticed that we will have
> > > several
> > > >> modules that are not maintained or are very old. This includes all
> of
> > > the
> > > >> Hive modules that moved into Hive years ago and also modules like
> > > >> parquet-scrooge and parquet-scala that are based on Scala 2.10 that
> > has
> > > >> been EOL for years.
> > > >>
> > > >> We also have 2 command-line utilities, parquet-tools and
> parquet-cli.
> > > The
> > > >> parquet-cli version is friendlier to use, but I’m clearly biased. In
> > any
> > > >> case, I don’t think we need to maintain both and it is confusing for
> > > users
> > > >> to have two modules that do the same thing.
> > > >>
> > > >> I propose we remove the following modules:
> > > >>
> > > >>   - parquet-hive-*
> > > >>   - parquet-scrooge
> > > >>   - parquet-scala
> > > >>   - parquet-tools
> > > >>   - parquet-hadoop-bundle (shaded deps)
> > > >>   -
> > > >>
> > > >>   parquet-cascading (in favor of parquet-cascading3, if we keep it)
> > > >>   There are also modules that I’m not sure about. Does anyone use
> > these?
> > > >>   -
> > > >>
> > > >>   parquet-thrift
> > > >>   - parquet-pig
> > > >>   - parquet-cascading3
> > > >>
> > > >> Pig hasn’t had an update (other than project-wide changes) since Oct
> > > 2017.
> > > >> I think it may be time to drop support in Pig and allow that to
> exist
> > > as a
> > > >> separate project if anyone is still interested in it.
> > > >>
> > > >> In the last few years, we’ve moved more to a model where processing
> > > >> frameworks and engines maintain their own integration. Spark,
> Presto,
> > > >> Iceberg, and Hive fall into this category. So I would prefer to drop
> > Pig
> > > >> and Cascading3. I’m fine keeping thrift if people think it is
> useful.
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> rb
> > > >> --
> > > >> Ryan Blue
> > > >> Software Engineer
> > > >> Netflix
> > > >>
> > >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to