Awesome, That was fast :). I'll look at it in detail and see if I can fill out on any missing details (if they are present). Thanks for taking a look at the 'cross compatibility tests'. That'll strike of a big item from the TODO list.
Best Prateek On Thu, Jan 22, 2026 at 8:59 AM Julien Le Dem <[email protected]> wrote: > Following Micah's suggestion yesterday, I took a stab at using Claude to > produce a java implementation of ALP based on Prateek's spec and c++ > implementation. > https://github.com/apache/parquet-java/pull/3390 > Bear in mind that I haven't closely reviewed it yet, it is fairly > experimental but it seems promising. > I will look into running cross compatibility tests with the cpp > implementation. > > On Wed, Jan 21, 2026 at 2:53 PM Andrew Lamb <[email protected]> > wrote: > > > > Would this require a > > more fundamental change to the data layout as proposed (i.e. something we > > > can't plugin by adding a new integer encoding)? > > > > > We can plugin a new layout, it would just be an enum change which > > triggers > > new > > > code path. We would have have to swap out bit unpacker which I used > > because > > > it was already present in arrow code base. I agree that fastlanes would > > be > > > good > > > > I agree with both of your assessments that this could be added in the > > future with the current spec. > > > > Thanks for the clarifications > > > > On Wed, Jan 21, 2026 at 5:38 PM PRATEEK GAUR <[email protected]> wrote: > > > > > > > > > > > > > > I think we touched on this briefly in a sync but linear encoding was > > > chosen > > > > because we already have these routines written for > > DELTA_BINARY_PACKED? I > > > > think the current design is extensible now to support other types of > > > > integer encodings. Or I might be misunderstanding. Would this > require > > a > > > > more fundamental change to the data layout as proposed (i.e. > something > > we > > > > can't plugin by adding a new integer encoding)? > > > > > > > > > > We can plugin a new layout, it would just be an enum change which > > triggers > > > new > > > code path. We would have have to swap out bit unpacker which I used > > because > > > it was already present in arrow code base. I agree that fastlanes would > > be > > > good > > > to have but that is also a more fundamental building block which I'm > > happy > > > to > > > take up outside the ALP effort and then integrate it with ALP later on > > > given ALP > > > allows a mechanism to deal with it with minimal changes. > > > > > > I fear with fastlanes and need to implement it it in all languages can > > > potentially > > > slow down the project. > > > > > > > > > > > > > If it isn't a fundamental change, unless we have a volunteer to > > implement > > > > it immediately, I think we can maybe defer this for follow-up work on > > > > integer encodings, and then add it as an option to ALP when it > becomes > > > > available. I want to be careful of moving the goal-posts here. > > > > > > > > > > Okay you and I are thinking along the same lines :). > > > > > > > > > > > > > > 2) The layout for exceptions, specifically making sure that the spec > > > allows > > > > > other potential layouts in the future to make them more GPU > friendly. > > > One > > > > > proposal is in the G-ALP[3] paper, but it comes with tradeoffs > (e.g. > > it > > > > > requires additional storage overhead). > > > > > > > > > > > > I think changing the exception layout would be handled by the version > > > enum > > > > in the current proposal? > > > > > > > > > > Yes, current spec allows for this. > > > > > > > > > > > > > > Cheers, > > > > Micah > > > > > > > > > > > > On Wed, Jan 21, 2026 at 1:57 PM Andrew Lamb <[email protected]> > > > > wrote: > > > > > > > > > First of all, thank you again for this spec. I would recommend > anyone > > > > else > > > > > curious about ALP (or wanting to read a well written technical > spec) > > to > > > > > read Prateek's document -- it is really nice. > > > > > > > > > > I would like to raise two more items (I am not sure the spec needs > to > > > be > > > > > changed to accommodate them, but I do think we should discuss > them): > > > > > > > > > > 1) Interleaving the bitpacked values (this was suggested by Peter > > > Boncz). > > > > > Specifically, I recommend we consider the technique described in > the > > > > > FASTLANES paper[1] (figure 1) that interleaves bit-packed values > in a > > > > > pattern that enables decoding multiple values using a single > > > > > SIMD instruction and is GPU friendly. To be clear we don't need to > > > > > implement all of the techniques described in that paper, but I > think > > > the > > > > > interleaving is worth considering. It seems like the current > > prototype > > > > uses > > > > > linear bitpacking[2] > > > > > > > > > > 2) The layout for exceptions, specifically making sure that the > spec > > > > allows > > > > > other potential layouts in the future to make them more GPU > friendly. > > > One > > > > > proposal is in the G-ALP[3] paper, but it comes with tradeoffs > (e.g. > > it > > > > > requires additional storage overhead). > > > > > > > > > > Andrew > > > > > > > > > > > > > > > [1]: https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf > > > > > [2]: > > > > > > > > > > > > > > > > > > > > https://github.com/apache/arrow/pull/48345/changes#diff-f9ab708cab94060b4067fff0a6739e9c3751b450422115663b2bd0badfcc748bR801 > > > > > [3]: https://dl.acm.org/doi/10.1145/3736227.3736242 > > > > > > > > > > On Wed, Jan 14, 2026 at 3:21 PM Andrew Lamb < > [email protected]> > > > > > wrote: > > > > > > > > > > > Here is a PR that turns Prateek's document into markdown in the > > > > > > parquet-format repo > > > > > > - https://github.com/apache/parquet-format/pull/548 > > > > > > > > > > > > I am a little worried we will have two set of parallel comments > > (one > > > in > > > > > > the google doc and one in the PR) > > > > > > > > > > > > However, the spec is of sufficient quality (thanks, again > Prateek) > > > that > > > > > it > > > > > > would be possible for another language implementation to be > > > attempted. > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jan 14, 2026 at 8:54 AM Andrew Lamb < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > >> I plan to help turn the document into a PR to parquet-format > later > > > > today > > > > > >> > > > > > >> And again thank you Prateek and everyone for helping make this > > > happen > > > > > >> > > > > > >> Andrew > > > > > >> > > > > > >> On Wed, Jan 14, 2026 at 6:34 AM Antoine Pitrou < > > [email protected]> > > > > > >> wrote: > > > > > >> > > > > > >>> > > > > > >>> Yes, I'd really rather comment on the final spec, rather than a > > > > Google > > > > > >>> doc. > > > > > >>> > > > > > >>> (also, Google Doc comments are not terrific for non-trivial > > > > > discussions) > > > > > >>> > > > > > >>> > > > > > >>> Le 14/01/2026 à 10:37, Gang Wu a écrit : > > > > > >>> > Is it better to create a PR against > > > > > >>> https://github.com/apache/parquet-format > > > > > >>> > so > > > > > >>> > it can become the single source of truth of the Parquet-ALP > > spec? > > > > > >>> > > > > > > >>> > On Wed, Jan 14, 2026 at 9:34 AM Julien Le Dem < > > [email protected] > > > > > > > > > >>> wrote: > > > > > >>> > > > > > > >>> >> Thank you Micah for the detailed review! > > > > > >>> >> Who else needs to do a round of reviews on the spec before > we > > > can > > > > > >>> finalize > > > > > >>> >> it? > > > > > >>> > > > > > >>> > > > > > >>> > > > > > > > > > > > > > > >
