Agreed, I think it would be useful to make sure the "compute" interfaces
have the right hooks to support alternate encodings.

On Sunday, August 30, 2020, Wes McKinney <wesmck...@gmail.com> wrote:

> That said, there is nothing preventing the development of programming
> interfaces for compressed / encoded data right now. When it comes to
> transporting such data, that's when we will have to decide on what to
> support and what new metadata structures are required.
>
> For example, we could add RLE to C++ in prototype form and then
> convert to non-RLE when writing to IPC messages.
>
> On Sat, Aug 29, 2020 at 7:34 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >
> > Hi Mark,
> > See the most recent previous discussion about alternate encodings [1].
> > This is something that in the long run should be added, I'd personally
> > prefer to start with simpler encodings.
> >
> > I don't think we should add anything more with regard to
> > compression/encoding until at least 3 languages support the current
> > compression methods that are in the specification.  C++ has it
> implemented,
> > there is some work in Java and I think we should have at least one more.
> >
> > -Micah
> >
> > [1]
> > https://lists.apache.org/thread.html/r1d9d707c481c53c13534f7c72d75c
> 7a90dc7b2b9966c6c0772d0e416%40%3Cdev.arrow.apache.org%3E
> >
> > On Sat, Aug 29, 2020 at 4:04 PM <m...@markfarnan.com> wrote:
> >
> > >
> > > I was looking at compression in arrow had a couple questions.
> > >
> > > If I've understood compression currently,   it is only used  'in
> flight'
> > > in either IPC or Arrow Flight, using a block compression,  but still
> > > decoded into Ram at the destination in full array form.  Is this
> correct ?
> > >
> > >
> > > Given that arrow is a columnar format, has any thought been given to an
> > > option to have the data compressed both in memory and in flight, using
> some
> > > of the columnar techniques ?
> > >  As I deal primarily with Timeseries numerical data, I was thinking
> about
> > > some of the algorithms from the Gorilla paper [1]  for Floats  and
> > > Timestamps (Delta-of-Delta) or similar might be appropriate.
> > >
> > > The interface functions could  still iterate over the data and produce
> raw
> > > values so this is transparent to users of the data, but the data
> > > blocks/arrays in-mem are actually compressed.
> > >
> > > With this method, blocks could come out of a data base/source, through
> the
> > > data service, across the wire (flight)  and land in the consuming
> > > applications memory without ever being decompressed or processed until
> > > final use.
> > >
> > >
> > > Crazy thought ?
> > >
> > >
> > > Regards
> > >
> > > Mark.
> > >
> > >
> > > [1]: https://www.vldb.org/pvldb/vol8/p1816-teller.pdf
> > >
> > >
>

Reply via email to