Re: Encoding options (delta, rle, ...) in pyarrow bindings

Uwe L. Korn Fri, 02 Nov 2018 06:48:55 -0700

Hello Sebastian,

there is not ETA on delta encoding as no one is actively working on it. There 
is some basic code implementing the relevant encoders in [1]. This code is not 
used at all at the moment as it does not fulfill the necessary APIs. The 
relevant JIRA tickets are [2], [3], and [4]. There you can ask questions (but 
can also use the ML for that) or discuss the implementation. The existing code 
needs to be ported to fulfill the interface as defined in [5]. Also note that 
we have moved the parquet-cpp code into the arrow repository so all changes 
will go there.


Uwe

[1]: 
https://github.com/apache/parquet-cpp/blob/d15d2687e9f154e69e956e2a56c8d1fd6c3b7ac8/benchmarks/decode_benchmark.cc
 
[2]: https://issues.apache.org/jira/browse/PARQUET-491
[3]: https://issues.apache.org/jira/browse/PARQUET-490
[4]: https://issues.apache.org/jira/browse/PARQUET-492
[5]: https://github.com/apache/arrow/blob/master/cpp/src/parquet/encoding.h

On Fri, Nov 2, 2018, at 2:33 PM, Sebastian Himberger wrote:
> Uwe, Wes,
> 
> thanks so much. I completely forgot to say that I was asking about parquet.
> It's good to know the current status though. I also didn't know that the
> dictionary encoding already has some form of RLE.
> 
> @Uwe: Any ETA on delta encoding? Is the being worked on or are other things
> more important ATM? I am not asking to generate pressure but out of
> curiosity. I appreciate that this is an open source project and if I need
> it I can just jump in and do it myself.
> 
> Thanks again and have a great day,
> Sebastian
> 
> 
> Am Fr., 2. Nov. 2018 um 14:27 Uhr schrieb Wes McKinney <wesmck...@gmail.com
> >:
> 
> > Hi Sebastian -- Uwe is referring to Parquet files. We don't yet have
> > in-memory RLE or Delta encoding in the Arrow columnar format. I suspect
> > this will eventually be added as it can be quite important to improve
> > in-memory query execution performance.
> >
> > Wes
> >
> > On Fri, Nov 2, 2018, 2:18 PM Uwe L. Korn <uw...@xhochy.com wrote:
> >
> > > Hello Sebastian,
> > >
> > > currently you can only switch between plain and
> > > dictionary-encoding-combined-with-run-length encoding using the
> > > `use_dictionary` flag on
> > >
> > https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table
> > > . Other encoding are yet only implemented on the read path, we cannot
> > write
> > > delta encodings yet.
> > >
> > > Uwe
> > >
> > > On Fri, Nov 2, 2018, at 12:53 PM, Sebastian Himberger wrote:
> > > > Hi,
> > > >
> > > > I hope this is the right list. I couldn't find a "users" list on the
> > > > website so please forgive me if I am interrupting here.
> > > >
> > > > I am developing an application using the pyarrow module. By reading
> > > through
> > > > the documents I couldn't find a way to specify an encoding like delta
> > or
> > > > run length to a column. Is this not supported yet or am I missing
> > > something?
> > > >
> > > > Thanks so much,
> > > > Sebastian
> > >
> >

Re: Encoding options (delta, rle, ...) in pyarrow bindings

Reply via email to