Re: [Parquet] ALP Encoding for Floating point data

Julien Le Dem Tue, 13 Jan 2026 17:34:46 -0800

Thank you Micah for the detailed review!
Who else needs to do a round of reviews on the spec before we can finalize
it?



On Tue, Jan 13, 2026 at 10:07 AM PRATEEK GAUR <[email protected]> wrote:

> Thanks Micah for a round of feedback.
>
> Here is a link to the spec document :
> https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit
>
> On Tue, Nov 25, 2025 at 8:57 AM PRATEEK GAUR <[email protected]> wrote:
>
> > On Sat, Nov 22, 2025 at 4:49 AM Steve Loughran <[email protected]>
> > wrote:
> >
> >> First, sorry: I think I accidentally marked as done the comment in the
> >> doc about x86 performance.
> >>
> >
> > No worries, I restored the thread :).
> >
> > Those x86 numbers are critical, especially AVX512 in a recent intel part.
> >> There's a notorious feature in the early ones where the cores would
> reduce
> >> frequency after you used the opcodes as a way of managing die
> temperature (
> >>
> https://stackoverflow.com/questions/56852812/simd-instructions-lowering-cpu-frequency
> >> ); the later ones and AMD models are the ones to worry about.
> >>
> >
> > We did collect performance numbers in our early prototype and they looked
> > good on x86 hardware. Though I didn't check the processor family.
> > In our arrow implementation we are also working on a comprehensive
> > benchmarking script which will help everyone run it on different CPU
> > families to get a good idea of performance.
> >
> > Best
> > Prateek
> >
> >
> >> On Sat, 22 Nov 2025 at 04:15, Prateek Gaur via dev <
> >> [email protected]> wrote:
> >>
> >>> Hi team,
> >>>
> >>> *ALP ---> ALP PeudoDecimal*
> >>>
> >>> As is visible from the numbers above and as stated in the paper too for
> >>> real double values, i.e the values with high precision points, it is
> very
> >>> difficult to get a good compression ratio.
> >>>
> >>> This combined with the fact that we want to keep the
> spec/implementation
> >>> simpler, stating Antoine directly here
> >>>
> >>> `*2. Do not include the ALPrd fallback which is a homegrown dictionary*
> >>>
> >>> *encoding without dictionary reuse accross pages, and instead rely on
> >>> awell-known Parquet encoding (such as BYTE_STREAM_SPLIT?)*`
> >>>
> >>> Also based on some discussion I had with Julien in person and the
> >>> biweekly
> >>> meeting with a number of you.
> >>>
> >>> We'll be going with ALPpd (pseudo decimal) as the first
> >>> implementation relying on the query engine based on its own heuristics
> to
> >>> decide on the right fallback to BYTE_STREAM_SPLIT of ZSTD.
> >>>
> >>> Best
> >>> Prateek
> >>>
> >>>
> >>>
> >>> On Thu, Nov 20, 2025 at 5:09 PM Prateek Gaur <
> [email protected]
> >>> >
> >>> wrote:
> >>>
> >>> > Sheet with numbers
> >>> > <
> >>>
> https://docs.google.com/spreadsheets/d/1NmCg0WZKeZUc6vNXXD8M3GIyNqF_H3goj6mVbT8at7A/edit?gid=1351944517#gid=1351944517
> >>> >
> >>> > .
> >>> >
> >>> > On Thu, Nov 20, 2025 at 5:09 PM PRATEEK GAUR <[email protected]>
> >>> wrote:
> >>> >
> >>> >> Hi team,
> >>> >>
> >>> >> There was a request from a few folks, Antoine Pitrou and Adam Reeve
> >>> if I
> >>> >> remember correctly, to perform the experiment on some of the papers
> >>> that
> >>> >> talked about BYTE_STREAM_SPLIT for completeness.
> >>> >> I wanted to share the numbers for the same in this sheet. At this
> >>> point
> >>> >> we have numbers on a wide variety of data.
> >>> >> (Will have to share the sheet from my snowflake account as our
> laptops
> >>> >> have fair bit of restriction with respect to copy paste permissions
> >>> :) )
> >>> >>
> >>> >> Best
> >>> >> Prateek
> >>> >>
> >>> >> On Thu, Nov 20, 2025 at 2:25 PM PRATEEK GAUR <[email protected]>
> >>> wrote:
> >>> >>
> >>> >>> Hi Julien,
> >>> >>>
> >>> >>> Yes based on
> >>> >>>
> >>> >>>    - Numbers presented
> >>> >>>    - Discussions over the doc and
> >>> >>>    - Multiple discussions in the biweekly meeting
> >>> >>>
> >>> >>> We are in a stage where we agree this is the right encoding to add
> >>> and
> >>> >>> we can move to the DRAFT/POC stage from DISCUSS stage.
> >>> >>> Will start working on the PR for the same.
> >>> >>>
> >>> >>> Thanks for bringing this up.
> >>> >>> Prateek
> >>> >>>
> >>> >>> On Thu, Nov 20, 2025 at 8:16 AM Julien Le Dem <[email protected]>
> >>> wrote:
> >>> >>>
> >>> >>>> @PRATEEK GAUR <[email protected]> : Would you agree that we are
> >>> past
> >>> >>>> the DISCUSS step and into the DRAFT/POC phase according to the
> >>> proposals
> >>> >>>> process <
> >>> https://github.com/apache/parquet-format/tree/master/proposals
> >>> >>>> >?
> >>> >>>> If yes, could you open a PR on this page to add this proposal to
> the
> >>> >>>> list?
> >>> >>>> https://github.com/apache/parquet-format/tree/master/proposals
> >>> >>>> Thank you!
> >>> >>>>
> >>> >>>>
> >>> >>>> On Thu, Oct 30, 2025 at 2:38 PM Andrew Lamb <
> [email protected]
> >>> >
> >>> >>>> wrote:
> >>> >>>>
> >>> >>>> > I have filed a ticket[1] in arrow-rs to track prototyping ALP in
> >>> the
> >>> >>>> Rust
> >>> >>>> > Parquet reader if anyone is interested
> >>> >>>> >
> >>> >>>> > Andrew
> >>> >>>> >
> >>> >>>> > [1]:  https://github.com/apache/arrow-rs/issues/8748
> >>> >>>> >
> >>> >>>> > On Wed, Oct 22, 2025 at 1:33 PM Micah Kornfield <
> >>> >>>> [email protected]>
> >>> >>>> > wrote:
> >>> >>>> >
> >>> >>>> > > >
> >>> >>>> > > > C++, Java and Rust support them for sure. I feel like we
> >>> should
> >>> >>>> > > > probably default to V2 at some point.
> >>> >>>> > >
> >>> >>>> > >
> >>> >>>> > > I seem to recall, some of the vectorized java readers
> (Iceberg,
> >>> >>>> Spark)
> >>> >>>> > > might not support V2 data pages (but I might be confusing this
> >>> with
> >>> >>>> > > encodings).  But this is only a vague recollection.
> >>> >>>> > >
> >>> >>>> > >
> >>> >>>> > >
> >>> >>>> > > On Wed, Oct 22, 2025 at 6:38 AM Andrew Lamb <
> >>> [email protected]
> >>> >>>> >
> >>> >>>> > > wrote:
> >>> >>>> > >
> >>> >>>> > > > > Someone has to add V2 data pages to
> >>> >>>> > > > >
> >>> >>>> > > >
> >>> >>>> > > >
> >>> >>>> > >
> >>> >>>> >
> >>> >>>>
> >>>
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> >>> >>>> > > > > :)
> >>> >>>> > > >
> >>> >>>> > > > Your wish is my command:
> >>> >>>> > https://github.com/apache/parquet-site/pull/124
> >>> >>>> > > >
> >>> >>>> > > > As the format grows in popularity and momentum builds to
> >>> evolve,
> >>> >>>> I feel
> >>> >>>> > > the
> >>> >>>> > > > content on the parquet.apache.org site could use
> refreshing /
> >>> >>>> > updating.
> >>> >>>> > > > So, while I had the site open, I made some other PRs to
> >>> scratch
> >>> >>>> various
> >>> >>>> > > > itches
> >>> >>>> > > >
> >>> >>>> > > > (I am absolutely 🎣 for someone to please review 🙏):
> >>> >>>> > > >
> >>> >>>> > > > 1. Add Variant/Geometry/Geography types to implementation
> >>> status
> >>> >>>> > matrix:
> >>> >>>> > > > https://github.com/apache/parquet-site/pull/123
> >>> >>>> > > > 2. Improve introduction / overview, add more links to spec
> and
> >>> >>>> > > > implementation status:
> >>> >>>> https://github.com/apache/parquet-site/pull/125
> >>> >>>> > > >
> >>> >>>> > > >
> >>> >>>> > > > Thanks,
> >>> >>>> > > > Andrew
> >>> >>>> > > >
> >>> >>>> > > > On Wed, Oct 22, 2025 at 4:09 AM Antoine Pitrou <
> >>> >>>> [email protected]>
> >>> >>>> > > wrote:
> >>> >>>> > > >
> >>> >>>> > > > >
> >>> >>>> > > > > Hi Julien, hi all,
> >>> >>>> > > > >
> >>> >>>> > > > > On Mon, 20 Oct 2025 15:14:58 -0700
> >>> >>>> > > > > Julien Le Dem <[email protected]> wrote:
> >>> >>>> > > > > >
> >>> >>>> > > > > > Another question from me:
> >>> >>>> > > > > >
> >>> >>>> > > > > > Since the goal is to not use compression at all in this
> >>> case
> >>> >>>> (no
> >>> >>>> > > ZSTD)
> >>> >>>> > > > > > I'm assuming we would be using either:
> >>> >>>> > > > > > - the Data Page V1 with UNCOMPRESSED in the
> >>> >>>> ColumnMetadata.column
> >>> >>>> > > > > > <
> >>> >>>> > > > >
> >>> >>>> > > >
> >>> >>>> > >
> >>> >>>> >
> >>> >>>>
> >>>
> https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L887
> >>> >>>> > > > > >
> >>> >>>> > > > > > field.
> >>> >>>> > > > > > - the Data Page V2 with false in the
> >>> >>>> DataPageHeaderV2.is_compressed
> >>> >>>> > > > > > <
> >>> >>>> > > > >
> >>> >>>> > > >
> >>> >>>> > >
> >>> >>>> >
> >>> >>>>
> >>>
> https://github.com/apache/parquet-format/blob/786142e26740487930ddc3ec5e39d780bd930907/src/main/thrift/parquet.thrift#L746
> >>> >>>> > > > > >
> >>> >>>> > > > > > field
> >>> >>>> > > > > > The second helping decide if we can selectively compress
> >>> some
> >>> >>>> pages
> >>> >>>> > > if
> >>> >>>> > > > > they
> >>> >>>> > > > > > are less compressed by the
> >>> >>>> > > > > > A few years ago there was a question on the support of
> the
> >>> >>>> > > DATA_PAGE_V2
> >>> >>>> > > > > and
> >>> >>>> > > > > > I was curious to hear a refresh on how that's generally
> >>> >>>> supported
> >>> >>>> > in
> >>> >>>> > > > > > Parquet implementations. The is_compressed field was
> >>> exactly
> >>> >>>> > intended
> >>> >>>> > > > to
> >>> >>>> > > > > > avoid block compression when the encoding itself is good
> >>> >>>> enough.
> >>> >>>> > > > >
> >>> >>>> > > > > Someone has to add V2 data pages to
> >>> >>>> > > > >
> >>> >>>> > > > >
> >>> >>>> > > >
> >>> >>>> > >
> >>> >>>> >
> >>> >>>>
> >>>
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> >>> >>>> > > > > :)
> >>> >>>> > > > >
> >>> >>>> > > > > C++, Java and Rust support them for sure. I feel like we
> >>> should
> >>> >>>> > > > > probably default to V2 at some point.
> >>> >>>> > > > >
> >>> >>>> > > > > Also see
> https://github.com/apache/parquet-java/issues/3344
> >>> for
> >>> >>>> > Java.
> >>> >>>> > > > >
> >>> >>>> > > > > Regards
> >>> >>>> > > > >
> >>> >>>> > > > > Antoine.
> >>> >>>> > > > >
> >>> >>>> > > > >
> >>> >>>> > > > > >
> >>> >>>> > > > > > Julien
> >>> >>>> > > > > >
> >>> >>>> > > > > > On Mon, Oct 20, 2025 at 11:57 AM Andrew Lamb
> >>> >>>> > > > > <[email protected]> wrote:
> >>> >>>> > > > > >
> >>> >>>> > > > > > > Thanks again Prateek and co for pushing this along!
> >>> >>>> > > > > > >
> >>> >>>> > > > > > >
> >>> >>>> > > > > > > > 1. Design and write our own Parquet-ALP spec so that
> >>> >>>> > > > implementations
> >>> >>>> > > > > > > > know exactly how to encode and represent data
> >>> >>>> > > > > > >
> >>> >>>> > > > > > > 100% agree with this (similar to what was done for
> >>> >>>> > ParquetVariant)
> >>> >>>> > > > > > >
> >>> >>>> > > > > > > > 2. I may be missing something, but the paper doesn't
> >>> seem
> >>> >>>> to
> >>> >>>> > > > > mention
> >>> >>>> > > > > > > non-finite values (such as +/-Inf and NaNs).
> >>> >>>> > > > > > >
> >>> >>>> > > > > > > I think they are handled via the "Exception"
> mechanism.
> >>> >>>> Vortex's
> >>> >>>> > > ALP
> >>> >>>> > > > > > > implementation (below) does appear to handle finite
> >>> >>>> numbers[2]
> >>> >>>> > > > > > >
> >>> >>>> > > > > > > > 3. It seems there is a single implementation, which
> is
> >>> >>>> the one
> >>> >>>> > > > > published
> >>> >>>> > > > > > > > together with the paper. It is not obvious that it
> >>> will be
> >>> >>>> > > > > > > > maintained in the future, and reusing it is probably
> >>> not
> >>> >>>> an
> >>> >>>> > > option
> >>> >>>> > > > > for
> >>> >>>> > > > > > > > non-C++ Parquet implementations
> >>> >>>> > > > > > >
> >>> >>>> > > > > > > My understanding from the call was that Prateek and
> team
> >>> >>>> > > > re-implemented
> >>> >>>> > > > > > > ALP  (did not use the implementation from CWI[3]) but
> >>> that
> >>> >>>> would
> >>> >>>> > be
> >>> >>>> > > > > good to
> >>> >>>> > > > > > > confirm.
> >>> >>>> > > > > > >
> >>> >>>> > > > > > > There is also a Rust implementation of ALP[1] that is
> >>> part
> >>> >>>> of the
> >>> >>>> > > > > Vortex
> >>> >>>> > > > > > > file format implementation. I have not reviewed it to
> >>> see
> >>> >>>> if it
> >>> >>>> > > > > deviates
> >>> >>>> > > > > > > from the algorithm presented in the paper.
> >>> >>>> > > > > > >
> >>> >>>> > > > > > > Andrew
> >>> >>>> > > > > > >
> >>> >>>> > > > > > > [1]:
> >>> >>>> > > > > > >
> >>> >>>> > > > > > >
> >>> >>>> > > > >
> >>> >>>> > > >
> >>> >>>> > >
> >>> >>>> >
> >>> >>>>
> >>>
> https://github.com/vortex-data/vortex/blob/534821969201b91985a8735b23fc0c415a425a56/encodings/alp/src/lib.rs
> >>> >>>> > > > > > > [2]:
> >>> >>>> > > > > > >
> >>> >>>> > > > > > >
> >>> >>>> > > > >
> >>> >>>> > > >
> >>> >>>> > >
> >>> >>>> >
> >>> >>>>
> >>>
> https://github.com/vortex-data/vortex/blob/534821969201b91985a8735b23fc0c415a425a56/encodings/alp/src/alp/compress.rs#L266-L281
> >>> >>>> > > > > > > [3]: https://github.com/cwida/ALP
> >>> >>>> > > > > > >
> >>> >>>> > > > > > >
> >>> >>>> > > > > > > On Mon, Oct 20, 2025 at 4:47 AM Antoine Pitrou
> >>> >>>> > > > > <[email protected]> wrote:
> >>> >>>> > > > > > >
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > Hello,
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > Thanks for doing this and I agree the numbers look
> >>> >>>> impressive.
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > I would ask if possible for more data points:
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > 1. More datasets: you could for example look at the
> >>> >>>> datasets
> >>> >>>> > that
> >>> >>>> > > > > were
> >>> >>>> > > > > > > > used to originally evalute BYTE_STREAM_SPLIT (see
> >>> >>>> > > > > > > > https://issues.apache.org/jira/browse/PARQUET-1622
> >>> and
> >>> >>>> > > > specifically
> >>> >>>> > > > > > > > the Google Doc linked there)
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > 2. Comparison to BYTE_STREAM_SPLIT + LZ4 and
> >>> >>>> BYTE_STREAM_SPLIT
> >>> >>>> > +
> >>> >>>> > > > ZSTD
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > 3. Optionally, some perf numbers on x86 too, but I
> >>> expect
> >>> >>>> that
> >>> >>>> > > ALP
> >>> >>>> > > > > will
> >>> >>>> > > > > > > > remain very good there as well
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > I also have the following reservations towards ALP:
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > 1. There is no published official spec AFAICT, just
> a
> >>> >>>> research
> >>> >>>> > > > paper.
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > 2. I may be missing something, but the paper doesn't
> >>> seem
> >>> >>>> to
> >>> >>>> > > > mention
> >>> >>>> > > > > > > > non-finite values (such as +/-Inf and NaNs).
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > 3. It seems there is a single implementation, which
> is
> >>> >>>> the one
> >>> >>>> > > > > published
> >>> >>>> > > > > > > > together with the paper. It is not obvious that it
> >>> will be
> >>> >>>> > > > > > > > maintained in the future, and reusing it is probably
> >>> not
> >>> >>>> an
> >>> >>>> > > option
> >>> >>>> > > > > for
> >>> >>>> > > > > > > > non-C++ Parquet implementations
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > 4. The encoding itself is complex, since it
> involves a
> >>> >>>> fallback
> >>> >>>> > > on
> >>> >>>> > > > > > > > another encoding if the primary encoding (which
> >>> >>>> constitutes the
> >>> >>>> > > > real
> >>> >>>> > > > > > > > innovation) doesn't work out on a piece of data.
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > Based on this, I would say that if we think ALP is
> >>> >>>> attractive
> >>> >>>> > for
> >>> >>>> > > > us,
> >>> >>>> > > > > > > > we may want to incorporate our own version of ALP
> >>> with the
> >>> >>>> > > > following
> >>> >>>> > > > > > > > changes:
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > 1. Design and write our own Parquet-ALP spec so that
> >>> >>>> > > > implementations
> >>> >>>> > > > > > > > know exactly how to encode and represent data
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > 2. Do not include the ALPrd fallback which is a
> >>> homegrown
> >>> >>>> > > > dictionary
> >>> >>>> > > > > > > > encoding without dictionary reuse accross pages, and
> >>> >>>> instead
> >>> >>>> > rely
> >>> >>>> > > > on
> >>> >>>> > > > > a
> >>> >>>> > > > > > > > well-known Parquet encoding (such as
> >>> BYTE_STREAM_SPLIT?)
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > 3. Replace the FOR encoding inside ALP, which aims
> at
> >>> >>>> > compressing
> >>> >>>> > > > > > > > integers efficiently, with our own
> DELTA_BINARY_PACKED
> >>> >>>> (which
> >>> >>>> > has
> >>> >>>> > > > the
> >>> >>>> > > > > > > > same qualities and is already available in Parquet
> >>> >>>> > > implementations)
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > Regards
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > Antoine.
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > > On Thu, 16 Oct 2025 14:47:33 -0700
> >>> >>>> > > > > > > > PRATEEK GAUR <[email protected]> wrote:
> >>> >>>> > > > > > > > > Hi team,
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > > > We spent some time evaluating ALP compression and
> >>> >>>> > decompression
> >>> >>>> > > > > > > compared
> >>> >>>> > > > > > > > to
> >>> >>>> > > > > > > > > other encoding alternatives like CHIMP/GORILLA and
> >>> >>>> > compression
> >>> >>>> > > > > > > techniques
> >>> >>>> > > > > > > > > like SNAPPY/LZ4/ZSTD. We presented these numbers
> to
> >>> the
> >>> >>>> > > community
> >>> >>>> > > > > > > members
> >>> >>>> > > > > > > > > on October 15th in the biweekly parquet meeting.
> ( I
> >>> >>>> can't
> >>> >>>> > seem
> >>> >>>> > > > > to
> >>> >>>> > > > > > > access
> >>> >>>> > > > > > > > > the recording, so please let me know what access
> >>> rules
> >>> >>>> I need
> >>> >>>> > > to
> >>> >>>> > > > > get to
> >>> >>>> > > > > > > > be
> >>> >>>> > > > > > > > > able to view it )
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > > > We did this evaluation over some datasets pointed
> by
> >>> >>>> the ALP
> >>> >>>> > > > paper
> >>> >>>> > > > > and
> >>> >>>> > > > > > > > some
> >>> >>>> > > > > > > > > pointed by the parquet community.
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > > > The results are available in the following
> document
> >>> >>>> > > > > > > > > <
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > >
> >>> >>>> > > > >
> >>> >>>> > > >
> >>> >>>> > >
> >>> >>>> >
> >>> >>>>
> >>>
> https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg/edit?tab=t.0
> >>> >>>> > > > >
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > > > :
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > >
> >>> >>>> > > > >
> >>> >>>> > > >
> >>> >>>> > >
> >>> >>>> >
> >>> >>>>
> >>>
> https://docs.google.com/document/d/1PlyUSfqCqPVwNt8XA-CfRqsbc0NKRG0Kk1FigEm3JOg
> >>> >>>> > > > >
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > > > Based on the numbers we see
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > > >    -  ALP is comparable to ZSTD(level=1) in terms
> of
> >>> >>>> > > compression
> >>> >>>> > > > > ratio
> >>> >>>> > > > > > > > and
> >>> >>>> > > > > > > > >    much better compared to other schemes. (numbers
> >>> in
> >>> >>>> the
> >>> >>>> > sheet
> >>> >>>> > > > > are
> >>> >>>> > > > > > > bytes
> >>> >>>> > > > > > > > >    needed to encode each value )
> >>> >>>> > > > > > > > >    - ALP going quite well in terms of
> decompression
> >>> >>>> speed
> >>> >>>> > > > (numbers
> >>> >>>> > > > > in
> >>> >>>> > > > > > > the
> >>> >>>> > > > > > > > >    sheet are bytes decompressed per second)
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > > > As next steps we will
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > > >    - Get the numbers for compression on top of
> byte
> >>> >>>> stream
> >>> >>>> > > split.
> >>> >>>> > > > > > > > >    - Evaluate the algorithm over a few more
> >>> datasets.
> >>> >>>> > > > > > > > >    - Have an implementation in the arrow-parquet
> >>> repo.
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > > > Looking forward to feedback from the community.
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > > > Best
> >>> >>>> > > > > > > > > Prateek and Dhirhan
> >>> >>>> > > > > > > > >
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > > >
> >>> >>>> > > > > > >
> >>> >>>> > > > > >
> >>> >>>> > > > >
> >>> >>>> > > > >
> >>> >>>> > > > >
> >>> >>>> > > > >
> >>> >>>> > > >
> >>> >>>> > >
> >>> >>>> >
> >>> >>>>
> >>> >>>
> >>>
> >>
>

Re: [Parquet] ALP Encoding for Floating point data

Reply via email to