hi Ivan -- as soon as practical it would be great to import the
codebase into the Apache project. We would have to conduct an IP
clearance process (http://incubator.apache.org/ip-clearance/) because
the code was not developed within the Community (i.e. under Apache
process / IP oversight / governance). The code does not have to be
feature complete nor production-ready in order to do this (we built
most of the C++ implementation within github.com/apache/parquet-cpp
starting in the beginning of 2016).

Let us know, we're always here to help.

thanks
Wes

On Mon, Jan 29, 2018 at 5:58 PM, Ivan Sadikov <[email protected]> wrote:
> Link: https://github.com/sunchao/parquet-rs
>
> I think @sunchao is in Apache Community already, there is an email on the
> GitHub profile page.
>
> We are just trying to bring it up to speed with other Parquet
> implementations, but there is still a lot of work to do:) Would appreciate
> any help!
>
> Currently adding encodings and decodings, I think only Delta byte array
> encodings and decodings are left - I will be adding them shortly.
>
>
> Cheers,
>
> Ivan
> On Tue, 30 Jan 2018 at 11:12 AM, Wes McKinney <[email protected]> wrote:
>
>> Cool. Where is this development happening? Would you like to join the
>> Apache Parquet community?
>>
>> - Wes
>>
>> On Mon, Jan 29, 2018 at 4:20 PM, Ivan Sadikov <[email protected]>
>> wrote:
>> > Thanks Wes. It is okay, I fixed the issues, so everything is great.
>> >
>> > We are currently pushing parquet-rs to be feature compatible with
>> > parquet-mr and parquet-cpp.
>> > On Tue, 30 Jan 2018 at 9:57 AM, Wes McKinney <[email protected]>
>> wrote:
>> >
>> >> hi Ivan -- note that this code has not been actively maintained
>> >> because this encoding is not in wide use yet (so there could be
>> >> discrepancies vs. what is in parquet-mr).
>> >>
>> >> thanks,
>> >> Wes
>> >>
>> >> On Sun, Jan 28, 2018 at 10:50 PM, Ivan Sadikov <[email protected]>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > I am currently trying to debug DeltaLengthByArrayDecoder in
>> parquet-cpp
>> >> and
>> >> > cannot understand how it knows where encoded lengths part ends (for
>> delta
>> >> > bit packing decoder) and actual byte array data begins.
>> >> >
>> >> > I can see that parquet-me simply loads all data to reach the end of
>> >> encoded
>> >> > lengths, but it looks like parquet-cpp does it differently.
>> >> >
>> >> > Would appreciate any help with this!
>> >> > Thanks!
>> >> >
>> >> >
>> >> > Cheers,
>> >> >
>> >> > Ivan
>> >>
>>

Reply via email to