> > Now the issue is that the header was exactly 8 bytes :). Do I keep 1 byte > as padding or move the header to 7 bytes? > Any recommendations?
I think it is fine for it to be 7 bytes, despite the asymmetry. We aren't likely to be loading these on aligned boundaries IIRC, and we are using Safe loading anyways? On Tue, Feb 24, 2026 at 12:52 PM PRATEEK GAUR <[email protected]> wrote: > Hi team, > > Regarding discussion on presence of version field concern that was raised > by Antoine. > > Antoine suggested removing the version field from the ALP header. > > 0 > > version > > 1 byte > > uint8 > > Format version (current is 1) > > > Concern raised was along the lines of it complicating the API. > > Having thought through this more I now agree with Antoine. > > In a little more formal words. > *Parquet's standard practice is to handle format evolution at the Thrift > metadata level by introducing a new* > *Encoding enum, **rather than embedding version numbers within the page's > binary data stream. **Managing* > *versioning through metadata allows readers to immediately identify > unsupported formats and fail gracefully* > *before they even begin reading or decoding the page. * > > We touched upon it in the last parquet meeting and I remember people in the > meeting also agreed with the suggestion. > > Now the issue is that the header was exactly 8 bytes :). Do I keep 1 byte > as padding or move the header to 7 bytes? > Any recommendations? > > On Tue, Feb 17, 2026 at 4:27 PM PRATEEK GAUR <[email protected]> wrote: > > > Hi team, > > > > 1) Andrew > > > > - Thanks for working on test files. My PR did add all the test files I > > used to benchmark on datasets. Maybe we can club it together. WIll > also aid > > cross language testing > > - Kosta Tarasov working on Rust implementation. This is great. Thanks > > > > > > 2) Antoine > > > > - Thanks a lot for reporting the numbers on AMD. Looks like you are > > getting 8X the decoding performance of BSS. This is amazing!!. > > - Thanks for acknowledging the sampling design. > > - I agree with you on Fastlanes. In some crude experiments I didn't > > get a good perf benefit from it on Graviton3 (but maybe there was > something > > wrong with my implementation). > > - Locking the 16bit exception encoding for the spec in this case. > > - Awesome I think we have solved for all open questions minus the > > version byte :). (will get back on this soon) > > > > > > 3) Micah > > > > - FastLanes : The current spec does allow for using FastLane with the > > configurable enum value for layout. We should be able to inject any > layout > > in the current design. > > > > > > Working on resolving all remaining open comments on the spec this week. > > > > Best > > Prateek > > > > > > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <[email protected]> > > wrote: > > > >> On Sun, 8 Feb 2026 at 18:12, Micah Kornfield <[email protected]> > >> wrote: > >> > >> > > >> > > >> > It looks like the actual issue described for ORC in the paper is that > it > >> > has multiple sub-encodings in a batch. This is different then the > >> design > >> > proposed here where there is still fixed encoding per page in parquet. > >> > Given reasonably sized pages I don't think branch misprediction should > >> be a > >> > big issue for new encodings. I agree that we should be conservative > in > >> > general for adding new encodings. > >> > > >> > > >> +1 > >> > > >
