>
> 1) Interleaving the bitpacked values (this was suggested by Peter Boncz).
> Specifically, I recommend we consider the technique described in the
> FASTLANES paper[1] (figure 1) that interleaves bit-packed values in a
> pattern that enables decoding multiple values using a single
> SIMD instruction and is GPU friendly. To be clear we don't need to
> implement all of the techniques described in that paper, but I think the
> interleaving is worth considering. It seems like the current prototype uses
> linear bitpacking[2]


I think we touched on this briefly in a sync but linear encoding was chosen
because we already have these routines written for DELTA_BINARY_PACKED? I
think the current design is extensible now to support other types of
integer encodings.  Or I might be misunderstanding. Would this require a
more fundamental change to the data layout as proposed (i.e. something we
can't plugin by adding a new integer encoding)?

If it isn't a fundamental change, unless we have a volunteer to implement
it immediately, I think we can maybe defer this for follow-up work on
integer encodings, and then add it as an option to ALP when it becomes
available. I want to be careful of moving the goal-posts here.

2) The layout for exceptions, specifically making sure that the spec allows
> other potential layouts in the future to make them more GPU friendly. One
> proposal is in the G-ALP[3] paper, but it comes with tradeoffs (e.g. it
> requires additional storage overhead).


I think changing the exception layout would be handled by the version enum
in the current proposal?

Cheers,
Micah


On Wed, Jan 21, 2026 at 1:57 PM Andrew Lamb <[email protected]> wrote:

> First of all, thank you again for this spec. I would recommend anyone else
> curious about ALP (or wanting to read a well written technical spec) to
> read Prateek's document -- it is really nice.
>
> I would like to raise two more items (I am not sure the spec needs to be
> changed to accommodate them, but I do think we should discuss them):
>
> 1) Interleaving the bitpacked values (this was suggested by Peter Boncz).
> Specifically, I recommend we consider the technique described in the
> FASTLANES paper[1] (figure 1) that interleaves bit-packed values in a
> pattern that enables decoding multiple values using a single
> SIMD instruction and is GPU friendly. To be clear we don't need to
> implement all of the techniques described in that paper, but I think the
> interleaving is worth considering. It seems like the current prototype uses
> linear bitpacking[2]
>
> 2) The layout for exceptions, specifically making sure that the spec allows
> other potential layouts in the future to make them more GPU friendly. One
> proposal is in the G-ALP[3] paper, but it comes with tradeoffs (e.g. it
> requires additional storage overhead).
>
> Andrew
>
>
> [1]: https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf
> [2]:
>
> https://github.com/apache/arrow/pull/48345/changes#diff-f9ab708cab94060b4067fff0a6739e9c3751b450422115663b2bd0badfcc748bR801
> [3]: https://dl.acm.org/doi/10.1145/3736227.3736242
>
> On Wed, Jan 14, 2026 at 3:21 PM Andrew Lamb <[email protected]>
> wrote:
>
> > Here is a PR that turns Prateek's document into markdown in the
> > parquet-format repo
> > - https://github.com/apache/parquet-format/pull/548
> >
> > I am a little worried we will have two set of parallel comments (one in
> > the google doc and one in the PR)
> >
> > However, the spec is of sufficient quality (thanks, again Prateek) that
> it
> > would be possible for another language implementation to be attempted.
> >
> > Andrew
> >
> >
> >
> > On Wed, Jan 14, 2026 at 8:54 AM Andrew Lamb <[email protected]>
> > wrote:
> >
> >> I plan to help turn the document into a PR to parquet-format later today
> >>
> >> And again thank you Prateek and everyone for helping make this happen
> >>
> >> Andrew
> >>
> >> On Wed, Jan 14, 2026 at 6:34 AM Antoine Pitrou <[email protected]>
> >> wrote:
> >>
> >>>
> >>> Yes, I'd really rather comment on the final spec, rather than a Google
> >>> doc.
> >>>
> >>> (also, Google Doc comments are not terrific for non-trivial
> discussions)
> >>>
> >>>
> >>> Le 14/01/2026 à 10:37, Gang Wu a écrit :
> >>> > Is it better to create a PR against
> >>> https://github.com/apache/parquet-format
> >>> > so
> >>> > it can become the single source of truth of the Parquet-ALP spec?
> >>> >
> >>> > On Wed, Jan 14, 2026 at 9:34 AM Julien Le Dem <[email protected]>
> >>> wrote:
> >>> >
> >>> >> Thank you Micah for the detailed review!
> >>> >> Who else needs to do a round of reviews on the spec before we can
> >>> finalize
> >>> >> it?
> >>>
> >>>
> >>>
>

Reply via email to