Hi everyone, I’m working on the 1.10.1 build and I’ve noticed that we will have several modules that are not maintained or are very old. This includes all of the Hive modules that moved into Hive years ago and also modules like parquet-scrooge and parquet-scala that are based on Scala 2.10 that has been EOL for years.
We also have 2 command-line utilities, parquet-tools and parquet-cli. The parquet-cli version is friendlier to use, but I’m clearly biased. In any case, I don’t think we need to maintain both and it is confusing for users to have two modules that do the same thing. I propose we remove the following modules: - parquet-hive-* - parquet-scrooge - parquet-scala - parquet-tools - parquet-hadoop-bundle (shaded deps) - parquet-cascading (in favor of parquet-cascading3, if we keep it) There are also modules that I’m not sure about. Does anyone use these? - parquet-thrift - parquet-pig - parquet-cascading3 Pig hasn’t had an update (other than project-wide changes) since Oct 2017. I think it may be time to drop support in Pig and allow that to exist as a separate project if anyone is still interested in it. In the last few years, we’ve moved more to a model where processing frameworks and engines maintain their own integration. Spark, Presto, Iceberg, and Hive fall into this category. So I would prefer to drop Pig and Cascading3. I’m fine keeping thrift if people think it is useful. Thoughts? rb -- Ryan Blue Software Engineer Netflix
