Re: [Parquet] ALP Encoding for Floating point data

PRATEEK GAUR Wed, 21 Jan 2026 14:33:47 -0800

Hi Andrew,

Thanks :).



   - Interleaved bit-packing : Yes this has been on my mind and thanks for
   bringing it up. It came up as a part of benchmark discussion for pFOR too.
   Thankfully keeping these improvements in mind we have designed the *ALP
   spec such that it allows* for the current FOR based integer encoding to
   be swapped out with FastLanes which is what I think Peter was referring to.
   - Exception layout : By the way hyper parameters  are picked the number
   of exceptions have to be low, as each exception has slightly higher
   overhead with respect to storage and read. This means that reading of
   exceptions is *not on the performance critical path* so not sure if
   trying more complicated GPU friendly encodings with give general
   improvement. With that said thankfully :), ALP Spec has been written with
   this extension in mind and one can change the version to accommodate for
   different exception encoding.


Best
Prateek

On Wed, Jan 21, 2026 at 1:57 PM Andrew Lamb <[email protected]> wrote:

> First of all, thank you again for this spec. I would recommend anyone else
> curious about ALP (or wanting to read a well written technical spec) to
> read Prateek's document -- it is really nice.
>
> I would like to raise two more items (I am not sure the spec needs to be
> changed to accommodate them, but I do think we should discuss them):
>
> 1) Interleaving the bitpacked values (this was suggested by Peter Boncz).
> Specifically, I recommend we consider the technique described in the
> FASTLANES paper[1] (figure 1) that interleaves bit-packed values in a
> pattern that enables decoding multiple values using a single
> SIMD instruction and is GPU friendly. To be clear we don't need to
> implement all of the techniques described in that paper, but I think the
> interleaving is worth considering. It seems like the current prototype uses
> linear bitpacking[2]
>
> 2) The layout for exceptions, specifically making sure that the spec allows
> other potential layouts in the future to make them more GPU friendly. One
> proposal is in the G-ALP[3] paper, but it comes with tradeoffs (e.g. it
> requires additional storage overhead).
>
> Andrew
>
>
> [1]: https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf
> [2]:
>
> https://github.com/apache/arrow/pull/48345/changes#diff-f9ab708cab94060b4067fff0a6739e9c3751b450422115663b2bd0badfcc748bR801
> [3]: https://dl.acm.org/doi/10.1145/3736227.3736242
>
> On Wed, Jan 14, 2026 at 3:21 PM Andrew Lamb <[email protected]>
> wrote:
>
> > Here is a PR that turns Prateek's document into markdown in the
> > parquet-format repo
> > - https://github.com/apache/parquet-format/pull/548
> >
> > I am a little worried we will have two set of parallel comments (one in
> > the google doc and one in the PR)
> >
> > However, the spec is of sufficient quality (thanks, again Prateek) that
> it
> > would be possible for another language implementation to be attempted.
> >
> > Andrew
> >
> >
> >
> > On Wed, Jan 14, 2026 at 8:54 AM Andrew Lamb <[email protected]>
> > wrote:
> >
> >> I plan to help turn the document into a PR to parquet-format later today
> >>
> >> And again thank you Prateek and everyone for helping make this happen
> >>
> >> Andrew
> >>
> >> On Wed, Jan 14, 2026 at 6:34 AM Antoine Pitrou <[email protected]>
> >> wrote:
> >>
> >>>
> >>> Yes, I'd really rather comment on the final spec, rather than a Google
> >>> doc.
> >>>
> >>> (also, Google Doc comments are not terrific for non-trivial
> discussions)
> >>>
> >>>
> >>> Le 14/01/2026 à 10:37, Gang Wu a écrit :
> >>> > Is it better to create a PR against
> >>> https://github.com/apache/parquet-format
> >>> > so
> >>> > it can become the single source of truth of the Parquet-ALP spec?
> >>> >
> >>> > On Wed, Jan 14, 2026 at 9:34 AM Julien Le Dem <[email protected]>
> >>> wrote:
> >>> >
> >>> >> Thank you Micah for the detailed review!
> >>> >> Who else needs to do a round of reviews on the spec before we can
> >>> finalize
> >>> >> it?
> >>>
> >>>
> >>>
>

Re: [Parquet] ALP Encoding for Floating point data

Reply via email to