>
> Now the issue is that the header was exactly 8 bytes :). Do I keep 1 byte
> as padding or move the header to 7 bytes?
> Any recommendations?


I think it is fine for it to be 7 bytes, despite the asymmetry.  We aren't
likely to be loading these on aligned boundaries IIRC, and we are using
Safe loading anyways?

On Tue, Feb 24, 2026 at 12:52 PM PRATEEK GAUR <[email protected]> wrote:

> Hi team,
>
> Regarding discussion on presence of version field concern that was raised
> by Antoine.
>
> Antoine suggested removing the version field from the ALP header.
>
> 0
>
> version
>
> 1 byte
>
> uint8
>
> Format version (current is 1)
>
>
> Concern raised was along the lines of it complicating the API.
>
> Having thought through this more I now agree with Antoine.
>
> In a little more formal words.
> *Parquet's standard practice is to handle format evolution at the Thrift
> metadata level by introducing a new*
> *Encoding enum, **rather than embedding version numbers within the page's
> binary data stream. **Managing*
> *versioning through metadata allows readers to immediately identify
> unsupported formats and fail gracefully*
> *before they even begin reading or decoding the page. *
>
> We touched upon it in the last parquet meeting and I remember people in the
> meeting also agreed with the suggestion.
>
> Now the issue is that the header was exactly 8 bytes :). Do I keep 1 byte
> as padding or move the header to 7 bytes?
> Any recommendations?
>
> On Tue, Feb 17, 2026 at 4:27 PM PRATEEK GAUR <[email protected]> wrote:
>
> > Hi team,
> >
> > 1) Andrew
> >
> >    - Thanks for working on test files. My PR did add all the test files I
> >    used to benchmark on datasets. Maybe we can club it together. WIll
> also aid
> >    cross language testing
> >    -  Kosta Tarasov working on Rust implementation. This is great. Thanks
> >
> >
> > 2) Antoine
> >
> >    - Thanks a lot for reporting the numbers on AMD. Looks like you are
> >    getting 8X the decoding performance of BSS. This is amazing!!.
> >    - Thanks for acknowledging the sampling design.
> >    - I agree with you on Fastlanes. In some crude experiments I didn't
> >    get a good perf benefit from it on Graviton3 (but maybe there was
> something
> >    wrong with my implementation).
> >    - Locking the 16bit exception encoding for the spec in this case.
> >    - Awesome I think we have solved for all open questions minus the
> >    version byte :). (will get back on this soon)
> >
> >
> > 3) Micah
> >
> >    - FastLanes : The current spec does allow for using FastLane with the
> >    configurable enum value for layout. We should be able to inject any
> layout
> >    in the current design.
> >
> >
> > Working on resolving all remaining open comments on the spec this week.
> >
> > Best
> > Prateek
> >
> >
> > On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <[email protected]>
> > wrote:
> >
> >> On Sun, 8 Feb 2026 at 18:12, Micah Kornfield <[email protected]>
> >> wrote:
> >>
> >> >
> >> >
> >> > It looks like the actual issue described for ORC in the paper is that
> it
> >> > has multiple sub-encodings in a batch.  This is different then the
> >> design
> >> > proposed here where there is still fixed encoding per page in parquet.
> >> > Given reasonably sized pages I don't think branch misprediction should
> >> be a
> >> > big issue for new encodings.  I agree that we should be conservative
> in
> >> > general for adding new encodings.
> >> >
> >> >
> >> +1
> >>
> >
>

Reply via email to