Hi team, Regarding discussion on presence of version field concern that was raised by Antoine.
Antoine suggested removing the version field from the ALP header. 0 version 1 byte uint8 Format version (current is 1) Concern raised was along the lines of it complicating the API. Having thought through this more I now agree with Antoine. In a little more formal words. *Parquet's standard practice is to handle format evolution at the Thrift metadata level by introducing a new* *Encoding enum, **rather than embedding version numbers within the page's binary data stream. **Managing* *versioning through metadata allows readers to immediately identify unsupported formats and fail gracefully* *before they even begin reading or decoding the page. * We touched upon it in the last parquet meeting and I remember people in the meeting also agreed with the suggestion. Now the issue is that the header was exactly 8 bytes :). Do I keep 1 byte as padding or move the header to 7 bytes? Any recommendations? On Tue, Feb 17, 2026 at 4:27 PM PRATEEK GAUR <[email protected]> wrote: > Hi team, > > 1) Andrew > > - Thanks for working on test files. My PR did add all the test files I > used to benchmark on datasets. Maybe we can club it together. WIll also aid > cross language testing > - Kosta Tarasov working on Rust implementation. This is great. Thanks > > > 2) Antoine > > - Thanks a lot for reporting the numbers on AMD. Looks like you are > getting 8X the decoding performance of BSS. This is amazing!!. > - Thanks for acknowledging the sampling design. > - I agree with you on Fastlanes. In some crude experiments I didn't > get a good perf benefit from it on Graviton3 (but maybe there was something > wrong with my implementation). > - Locking the 16bit exception encoding for the spec in this case. > - Awesome I think we have solved for all open questions minus the > version byte :). (will get back on this soon) > > > 3) Micah > > - FastLanes : The current spec does allow for using FastLane with the > configurable enum value for layout. We should be able to inject any layout > in the current design. > > > Working on resolving all remaining open comments on the spec this week. > > Best > Prateek > > > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <[email protected]> > wrote: > >> On Sun, 8 Feb 2026 at 18:12, Micah Kornfield <[email protected]> >> wrote: >> >> > >> > >> > It looks like the actual issue described for ORC in the paper is that it >> > has multiple sub-encodings in a batch. This is different then the >> design >> > proposed here where there is still fixed encoding per page in parquet. >> > Given reasonably sized pages I don't think branch misprediction should >> be a >> > big issue for new encodings. I agree that we should be conservative in >> > general for adding new encodings. >> > >> > >> +1 >> >
