hi Ivan -- as soon as practical it would be great to import the codebase into the Apache project. We would have to conduct an IP clearance process (http://incubator.apache.org/ip-clearance/) because the code was not developed within the Community (i.e. under Apache process / IP oversight / governance). The code does not have to be feature complete nor production-ready in order to do this (we built most of the C++ implementation within github.com/apache/parquet-cpp starting in the beginning of 2016).
Let us know, we're always here to help. thanks Wes On Mon, Jan 29, 2018 at 5:58 PM, Ivan Sadikov <[email protected]> wrote: > Link: https://github.com/sunchao/parquet-rs > > I think @sunchao is in Apache Community already, there is an email on the > GitHub profile page. > > We are just trying to bring it up to speed with other Parquet > implementations, but there is still a lot of work to do:) Would appreciate > any help! > > Currently adding encodings and decodings, I think only Delta byte array > encodings and decodings are left - I will be adding them shortly. > > > Cheers, > > Ivan > On Tue, 30 Jan 2018 at 11:12 AM, Wes McKinney <[email protected]> wrote: > >> Cool. Where is this development happening? Would you like to join the >> Apache Parquet community? >> >> - Wes >> >> On Mon, Jan 29, 2018 at 4:20 PM, Ivan Sadikov <[email protected]> >> wrote: >> > Thanks Wes. It is okay, I fixed the issues, so everything is great. >> > >> > We are currently pushing parquet-rs to be feature compatible with >> > parquet-mr and parquet-cpp. >> > On Tue, 30 Jan 2018 at 9:57 AM, Wes McKinney <[email protected]> >> wrote: >> > >> >> hi Ivan -- note that this code has not been actively maintained >> >> because this encoding is not in wide use yet (so there could be >> >> discrepancies vs. what is in parquet-mr). >> >> >> >> thanks, >> >> Wes >> >> >> >> On Sun, Jan 28, 2018 at 10:50 PM, Ivan Sadikov <[email protected]> >> >> wrote: >> >> > Hello, >> >> > >> >> > I am currently trying to debug DeltaLengthByArrayDecoder in >> parquet-cpp >> >> and >> >> > cannot understand how it knows where encoded lengths part ends (for >> delta >> >> > bit packing decoder) and actual byte array data begins. >> >> > >> >> > I can see that parquet-me simply loads all data to reach the end of >> >> encoded >> >> > lengths, but it looks like parquet-cpp does it differently. >> >> > >> >> > Would appreciate any help with this! >> >> > Thanks! >> >> > >> >> > >> >> > Cheers, >> >> > >> >> > Ivan >> >> >>
