Hello Fokko, 

I have put up a PR for the Scala update 
https://github.com/apache/parquet-mr/pull/605. parquet-scrooge fails due to a 
Thrift parsing error but parquet-scala succeeds with Scala 2.12 With dropping 
scrooge, we could at least move this forward.

Uwe

> Am 29.01.2019 um 11:40 schrieb Nandor Kollar <[email protected]>:
> 
> Removing parquet-hive-* is a great idea, the code in Parquet is not
> maintained any more, it is just a burden there.
> 
> As of parquet-pig, I'd prefer moving it to Pig (if Pig community accepts it
> as it is) instead of dropping it or moving to a separate project. I know
> people who still use Pig with Parquet.
> 
> Regards,
> Nandor
> 
>> On Mon, Jan 28, 2019 at 6:29 PM Ryan Blue <[email protected]> wrote:
>> 
>> Hi everyone,
>> 
>> I’m working on the 1.10.1 build and I’ve noticed that we will have several
>> modules that are not maintained or are very old. This includes all of the
>> Hive modules that moved into Hive years ago and also modules like
>> parquet-scrooge and parquet-scala that are based on Scala 2.10 that has
>> been EOL for years.
>> 
>> We also have 2 command-line utilities, parquet-tools and parquet-cli. The
>> parquet-cli version is friendlier to use, but I’m clearly biased. In any
>> case, I don’t think we need to maintain both and it is confusing for users
>> to have two modules that do the same thing.
>> 
>> I propose we remove the following modules:
>> 
>>   - parquet-hive-*
>>   - parquet-scrooge
>>   - parquet-scala
>>   - parquet-tools
>>   - parquet-hadoop-bundle (shaded deps)
>>   -
>> 
>>   parquet-cascading (in favor of parquet-cascading3, if we keep it)
>>   There are also modules that I’m not sure about. Does anyone use these?
>>   -
>> 
>>   parquet-thrift
>>   - parquet-pig
>>   - parquet-cascading3
>> 
>> Pig hasn’t had an update (other than project-wide changes) since Oct 2017.
>> I think it may be time to drop support in Pig and allow that to exist as a
>> separate project if anyone is still interested in it.
>> 
>> In the last few years, we’ve moved more to a model where processing
>> frameworks and engines maintain their own integration. Spark, Presto,
>> Iceberg, and Hive fall into this category. So I would prefer to drop Pig
>> and Cascading3. I’m fine keeping thrift if people think it is useful.
>> 
>> Thoughts?
>> 
>> rb
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>> 

Reply via email to