I have also taken the liberty to solicit feedback (links below for my own personal memory) from other open source implementations listed on our implementation status page, in case they would like to help with the process and share their experience implementing and maintaining the current encodings.
Andrew https://github.com/pola-rs/polars/issues/26279 https://github.com/duckdb/duckdb/discussions/20665 https://github.com/rapidsai/cudf/issues/21173 https://github.com/apache/arrow-go/issues/646 https://github.com/hyparam/hyparquet/issues/151 On Thu, Jan 22, 2026 at 12:12 PM PRATEEK GAUR <[email protected]> wrote: > Awesome, > > That was fast :). I'll look at it in detail and see if I can fill out on > any missing details (if they are present). > Thanks for taking a look at the 'cross compatibility tests'. That'll strike > of a big item from the TODO list. > > Best > Prateek > > On Thu, Jan 22, 2026 at 8:59 AM Julien Le Dem <[email protected]> wrote: > > > Following Micah's suggestion yesterday, I took a stab at using Claude to > > produce a java implementation of ALP based on Prateek's spec and c++ > > implementation. > > https://github.com/apache/parquet-java/pull/3390 > > Bear in mind that I haven't closely reviewed it yet, it is fairly > > experimental but it seems promising. > > I will look into running cross compatibility tests with the cpp > > implementation. > > > > On Wed, Jan 21, 2026 at 2:53 PM Andrew Lamb <[email protected]> > > wrote: > > > > > > Would this require a > > > more fundamental change to the data layout as proposed (i.e. something > we > > > > can't plugin by adding a new integer encoding)? > > > > > > > We can plugin a new layout, it would just be an enum change which > > > triggers > > > new > > > > code path. We would have have to swap out bit unpacker which I used > > > because > > > > it was already present in arrow code base. I agree that fastlanes > would > > > be > > > > good > > > > > > I agree with both of your assessments that this could be added in the > > > future with the current spec. > > > > > > Thanks for the clarifications > > > > > > On Wed, Jan 21, 2026 at 5:38 PM PRATEEK GAUR <[email protected]> > wrote: > > > > > > > > > > > > > > > > > > I think we touched on this briefly in a sync but linear encoding > was > > > > chosen > > > > > because we already have these routines written for > > > DELTA_BINARY_PACKED? I > > > > > think the current design is extensible now to support other types > of > > > > > integer encodings. Or I might be misunderstanding. Would this > > require > > > a > > > > > more fundamental change to the data layout as proposed (i.e. > > something > > > we > > > > > can't plugin by adding a new integer encoding)? > > > > > > > > > > > > > We can plugin a new layout, it would just be an enum change which > > > triggers > > > > new > > > > code path. We would have have to swap out bit unpacker which I used > > > because > > > > it was already present in arrow code base. I agree that fastlanes > would > > > be > > > > good > > > > to have but that is also a more fundamental building block which I'm > > > happy > > > > to > > > > take up outside the ALP effort and then integrate it with ALP later > on > > > > given ALP > > > > allows a mechanism to deal with it with minimal changes. > > > > > > > > I fear with fastlanes and need to implement it it in all languages > can > > > > potentially > > > > slow down the project. > > > > > > > > > > > > > > > > > If it isn't a fundamental change, unless we have a volunteer to > > > implement > > > > > it immediately, I think we can maybe defer this for follow-up work > on > > > > > integer encodings, and then add it as an option to ALP when it > > becomes > > > > > available. I want to be careful of moving the goal-posts here. > > > > > > > > > > > > > Okay you and I are thinking along the same lines :). > > > > > > > > > > > > > > > > > > 2) The layout for exceptions, specifically making sure that the > spec > > > > allows > > > > > > other potential layouts in the future to make them more GPU > > friendly. > > > > One > > > > > > proposal is in the G-ALP[3] paper, but it comes with tradeoffs > > (e.g. > > > it > > > > > > requires additional storage overhead). > > > > > > > > > > > > > > > I think changing the exception layout would be handled by the > version > > > > enum > > > > > in the current proposal? > > > > > > > > > > > > > Yes, current spec allows for this. > > > > > > > > > > > > > > > > > > Cheers, > > > > > Micah > > > > > > > > > > > > > > > On Wed, Jan 21, 2026 at 1:57 PM Andrew Lamb < > [email protected]> > > > > > wrote: > > > > > > > > > > > First of all, thank you again for this spec. I would recommend > > anyone > > > > > else > > > > > > curious about ALP (or wanting to read a well written technical > > spec) > > > to > > > > > > read Prateek's document -- it is really nice. > > > > > > > > > > > > I would like to raise two more items (I am not sure the spec > needs > > to > > > > be > > > > > > changed to accommodate them, but I do think we should discuss > > them): > > > > > > > > > > > > 1) Interleaving the bitpacked values (this was suggested by Peter > > > > Boncz). > > > > > > Specifically, I recommend we consider the technique described in > > the > > > > > > FASTLANES paper[1] (figure 1) that interleaves bit-packed values > > in a > > > > > > pattern that enables decoding multiple values using a single > > > > > > SIMD instruction and is GPU friendly. To be clear we don't need > to > > > > > > implement all of the techniques described in that paper, but I > > think > > > > the > > > > > > interleaving is worth considering. It seems like the current > > > prototype > > > > > uses > > > > > > linear bitpacking[2] > > > > > > > > > > > > 2) The layout for exceptions, specifically making sure that the > > spec > > > > > allows > > > > > > other potential layouts in the future to make them more GPU > > friendly. > > > > One > > > > > > proposal is in the G-ALP[3] paper, but it comes with tradeoffs > > (e.g. > > > it > > > > > > requires additional storage overhead). > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > [1]: https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf > > > > > > [2]: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/arrow/pull/48345/changes#diff-f9ab708cab94060b4067fff0a6739e9c3751b450422115663b2bd0badfcc748bR801 > > > > > > [3]: https://dl.acm.org/doi/10.1145/3736227.3736242 > > > > > > > > > > > > On Wed, Jan 14, 2026 at 3:21 PM Andrew Lamb < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Here is a PR that turns Prateek's document into markdown in the > > > > > > > parquet-format repo > > > > > > > - https://github.com/apache/parquet-format/pull/548 > > > > > > > > > > > > > > I am a little worried we will have two set of parallel comments > > > (one > > > > in > > > > > > > the google doc and one in the PR) > > > > > > > > > > > > > > However, the spec is of sufficient quality (thanks, again > > Prateek) > > > > that > > > > > > it > > > > > > > would be possible for another language implementation to be > > > > attempted. > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jan 14, 2026 at 8:54 AM Andrew Lamb < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > >> I plan to help turn the document into a PR to parquet-format > > later > > > > > today > > > > > > >> > > > > > > >> And again thank you Prateek and everyone for helping make this > > > > happen > > > > > > >> > > > > > > >> Andrew > > > > > > >> > > > > > > >> On Wed, Jan 14, 2026 at 6:34 AM Antoine Pitrou < > > > [email protected]> > > > > > > >> wrote: > > > > > > >> > > > > > > >>> > > > > > > >>> Yes, I'd really rather comment on the final spec, rather > than a > > > > > Google > > > > > > >>> doc. > > > > > > >>> > > > > > > >>> (also, Google Doc comments are not terrific for non-trivial > > > > > > discussions) > > > > > > >>> > > > > > > >>> > > > > > > >>> Le 14/01/2026 à 10:37, Gang Wu a écrit : > > > > > > >>> > Is it better to create a PR against > > > > > > >>> https://github.com/apache/parquet-format > > > > > > >>> > so > > > > > > >>> > it can become the single source of truth of the Parquet-ALP > > > spec? > > > > > > >>> > > > > > > > >>> > On Wed, Jan 14, 2026 at 9:34 AM Julien Le Dem < > > > [email protected] > > > > > > > > > > > >>> wrote: > > > > > > >>> > > > > > > > >>> >> Thank you Micah for the detailed review! > > > > > > >>> >> Who else needs to do a round of reviews on the spec before > > we > > > > can > > > > > > >>> finalize > > > > > > >>> >> it? > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > >
