Hi Kakimura, > By the way, do you think it's necessary to implement ALP directly within Parquet to evaluate its performance?
>From my perspective, the algorithm's performance is well explained in the paper[1]. I suggest there are 2 milestones: 1. Gather any additional evidence that the algorithm is worth pursuing (e.g. perhaps apply to your data, or independently reproduce the results in the paper) 2. Make the case / proposal to add to Parquet. Perhaps a good first thing to try would be your datasets with the Vortex[2] file format (which has an implementation of ALP) When we get to step 2, I do think we'll need to integrate with two Parquet implementations. > Just as a note, in many cases floating point values are better stored as scaled integers. Andrew, indeed you are right. In fact the core ALP algorithm is transforming from floating point to scaled integers (and then applying the techniques from FastLanes[3] which auto-vectorizes well) Andrew [1]: https://dl.acm.org/doi/10.1145/3626717 [2]: https://github.com/vortex-data/vortex [3]: https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf On Tue, Oct 7, 2025 at 8:26 AM [email protected] <[email protected]> wrote: > Thank you Andrew, > > I'll take a look at the C/C++ and Rust implementations. > > By the way, do you think it's necessary to implement ALP directly within > Parquet to evaluate its performance? Or would it be sufficient to benchmark > it using the implementations you mentioned without integrating it into > Parquet, > just to get a sense of its potential? > > Naohiro > > On 2025/10/03 13:21:03 Andrew Lamb wrote: > > This is super exciting, thank you Naohiro > > > > I also think ALP[1] (built on FastLanes[2]) is a great encoding to > explore > > > > Getting a Java based implementation of ALP would be a great validation > > that the approach works well across platforms. There are open source > > implementations in both C/C++[3] and Rust (via vortex) [4] that we could > > use to benchmark / build prototypes > > > > Andrew > > > > [1]: https://ir.cwi.nl/pub/33334/33334.pdf > > [2]: https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf > > [3]: > > > https://github.com/cwida/FastLanes/tree/4014a3a51083a06b6d446fb78e446494721aa12b/src/alp > > [4]: > > > https://github.com/vortex-data/vortex/blob/153040140e72d9038f5c092e6c6348c28a462211/encodings/alp/src/lib.rs#L4 > > > > On Fri, Oct 3, 2025 at 12:22 AM > [email protected] > > <[email protected]> wrote: > > > > > Hi Andrew, > > > > > > I'm Naohiro, and I'm the person Julien has been in touch with. I was > > > planning to attend the sync yesterday but unfortunately missed it due > to > > > the timezone difference. (I’m in Japan) > > > > > > Thanks for kicking off this discussion, I'm definitely interested in > > > contributing. > > > > > > To > >
