Awesome,

That was fast :). I'll look at it in detail and see if I can fill out on
any missing details (if they are present).
Thanks for taking a look at the 'cross compatibility tests'. That'll strike
of a big item from the TODO list.

Best
Prateek

On Thu, Jan 22, 2026 at 8:59 AM Julien Le Dem <[email protected]> wrote:

> Following Micah's suggestion yesterday, I took a stab at using Claude to
> produce a java implementation of ALP based on Prateek's spec and c++
> implementation.
> https://github.com/apache/parquet-java/pull/3390
> Bear in mind that I haven't closely reviewed it yet, it is fairly
> experimental but it seems promising.
> I will look into running cross compatibility tests with the cpp
> implementation.
>
> On Wed, Jan 21, 2026 at 2:53 PM Andrew Lamb <[email protected]>
> wrote:
>
> > > Would this require a
> > more fundamental change to the data layout as proposed (i.e. something we
> > > can't plugin by adding a new integer encoding)?
> >
> > > We can plugin a new layout, it would just be an enum change which
> > triggers
> > new
> > > code path. We would have have to swap out bit unpacker which I used
> > because
> > > it was already present in arrow code base. I agree that fastlanes would
> > be
> > > good
> >
> > I agree with both of your assessments that this could be added in the
> > future with the current spec.
> >
> > Thanks for the clarifications
> >
> > On Wed, Jan 21, 2026 at 5:38 PM PRATEEK GAUR <[email protected]> wrote:
> >
> > > >
> > > >
> > > > I think we touched on this briefly in a sync but linear encoding was
> > > chosen
> > > > because we already have these routines written for
> > DELTA_BINARY_PACKED? I
> > > > think the current design is extensible now to support other types of
> > > > integer encodings.  Or I might be misunderstanding. Would this
> require
> > a
> > > > more fundamental change to the data layout as proposed (i.e.
> something
> > we
> > > > can't plugin by adding a new integer encoding)?
> > > >
> > >
> > > We can plugin a new layout, it would just be an enum change which
> > triggers
> > > new
> > > code path. We would have have to swap out bit unpacker which I used
> > because
> > > it was already present in arrow code base. I agree that fastlanes would
> > be
> > > good
> > > to have but that is also a more fundamental building block which I'm
> > happy
> > > to
> > > take up outside the ALP effort and then integrate it with ALP later on
> > > given ALP
> > > allows a mechanism to deal with it with minimal changes.
> > >
> > > I fear with fastlanes and need to implement it it in all languages can
> > > potentially
> > > slow down the project.
> > >
> > >
> > >
> > > > If it isn't a fundamental change, unless we have a volunteer to
> > implement
> > > > it immediately, I think we can maybe defer this for follow-up work on
> > > > integer encodings, and then add it as an option to ALP when it
> becomes
> > > > available. I want to be careful of moving the goal-posts here.
> > > >
> > >
> > > Okay you and I are thinking along the same lines :).
> > >
> > >
> > > >
> > > > 2) The layout for exceptions, specifically making sure that the spec
> > > allows
> > > > > other potential layouts in the future to make them more GPU
> friendly.
> > > One
> > > > > proposal is in the G-ALP[3] paper, but it comes with tradeoffs
> (e.g.
> > it
> > > > > requires additional storage overhead).
> > > >
> > > >
> > > > I think changing the exception layout would be handled by the version
> > > enum
> > > > in the current proposal?
> > > >
> > >
> > > Yes, current spec allows for this.
> > >
> > >
> > > >
> > > > Cheers,
> > > > Micah
> > > >
> > > >
> > > > On Wed, Jan 21, 2026 at 1:57 PM Andrew Lamb <[email protected]>
> > > > wrote:
> > > >
> > > > > First of all, thank you again for this spec. I would recommend
> anyone
> > > > else
> > > > > curious about ALP (or wanting to read a well written technical
> spec)
> > to
> > > > > read Prateek's document -- it is really nice.
> > > > >
> > > > > I would like to raise two more items (I am not sure the spec needs
> to
> > > be
> > > > > changed to accommodate them, but I do think we should discuss
> them):
> > > > >
> > > > > 1) Interleaving the bitpacked values (this was suggested by Peter
> > > Boncz).
> > > > > Specifically, I recommend we consider the technique described in
> the
> > > > > FASTLANES paper[1] (figure 1) that interleaves bit-packed values
> in a
> > > > > pattern that enables decoding multiple values using a single
> > > > > SIMD instruction and is GPU friendly. To be clear we don't need to
> > > > > implement all of the techniques described in that paper, but I
> think
> > > the
> > > > > interleaving is worth considering. It seems like the current
> > prototype
> > > > uses
> > > > > linear bitpacking[2]
> > > > >
> > > > > 2) The layout for exceptions, specifically making sure that the
> spec
> > > > allows
> > > > > other potential layouts in the future to make them more GPU
> friendly.
> > > One
> > > > > proposal is in the G-ALP[3] paper, but it comes with tradeoffs
> (e.g.
> > it
> > > > > requires additional storage overhead).
> > > > >
> > > > > Andrew
> > > > >
> > > > >
> > > > > [1]: https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf
> > > > > [2]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/arrow/pull/48345/changes#diff-f9ab708cab94060b4067fff0a6739e9c3751b450422115663b2bd0badfcc748bR801
> > > > > [3]: https://dl.acm.org/doi/10.1145/3736227.3736242
> > > > >
> > > > > On Wed, Jan 14, 2026 at 3:21 PM Andrew Lamb <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Here is a PR that turns Prateek's document into markdown in the
> > > > > > parquet-format repo
> > > > > > - https://github.com/apache/parquet-format/pull/548
> > > > > >
> > > > > > I am a little worried we will have two set of parallel comments
> > (one
> > > in
> > > > > > the google doc and one in the PR)
> > > > > >
> > > > > > However, the spec is of sufficient quality (thanks, again
> Prateek)
> > > that
> > > > > it
> > > > > > would be possible for another language implementation to be
> > > attempted.
> > > > > >
> > > > > > Andrew
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 14, 2026 at 8:54 AM Andrew Lamb <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > >> I plan to help turn the document into a PR to parquet-format
> later
> > > > today
> > > > > >>
> > > > > >> And again thank you Prateek and everyone for helping make this
> > > happen
> > > > > >>
> > > > > >> Andrew
> > > > > >>
> > > > > >> On Wed, Jan 14, 2026 at 6:34 AM Antoine Pitrou <
> > [email protected]>
> > > > > >> wrote:
> > > > > >>
> > > > > >>>
> > > > > >>> Yes, I'd really rather comment on the final spec, rather than a
> > > > Google
> > > > > >>> doc.
> > > > > >>>
> > > > > >>> (also, Google Doc comments are not terrific for non-trivial
> > > > > discussions)
> > > > > >>>
> > > > > >>>
> > > > > >>> Le 14/01/2026 à 10:37, Gang Wu a écrit :
> > > > > >>> > Is it better to create a PR against
> > > > > >>> https://github.com/apache/parquet-format
> > > > > >>> > so
> > > > > >>> > it can become the single source of truth of the Parquet-ALP
> > spec?
> > > > > >>> >
> > > > > >>> > On Wed, Jan 14, 2026 at 9:34 AM Julien Le Dem <
> > [email protected]
> > > >
> > > > > >>> wrote:
> > > > > >>> >
> > > > > >>> >> Thank you Micah for the detailed review!
> > > > > >>> >> Who else needs to do a round of reviews on the spec before
> we
> > > can
> > > > > >>> finalize
> > > > > >>> >> it?
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
>

Reply via email to