Hi team,

Regarding discussion on presence of version field concern that was raised
by Antoine.

Antoine suggested removing the version field from the ALP header.

0

version

1 byte

uint8

Format version (current is 1)


Concern raised was along the lines of it complicating the API.

Having thought through this more I now agree with Antoine.

In a little more formal words.
*Parquet's standard practice is to handle format evolution at the Thrift
metadata level by introducing a new*
*Encoding enum, **rather than embedding version numbers within the page's
binary data stream. **Managing*
*versioning through metadata allows readers to immediately identify
unsupported formats and fail gracefully*
*before they even begin reading or decoding the page. *

We touched upon it in the last parquet meeting and I remember people in the
meeting also agreed with the suggestion.

Now the issue is that the header was exactly 8 bytes :). Do I keep 1 byte
as padding or move the header to 7 bytes?
Any recommendations?

On Tue, Feb 17, 2026 at 4:27 PM PRATEEK GAUR <[email protected]> wrote:

> Hi team,
>
> 1) Andrew
>
>    - Thanks for working on test files. My PR did add all the test files I
>    used to benchmark on datasets. Maybe we can club it together. WIll also aid
>    cross language testing
>    -  Kosta Tarasov working on Rust implementation. This is great. Thanks
>
>
> 2) Antoine
>
>    - Thanks a lot for reporting the numbers on AMD. Looks like you are
>    getting 8X the decoding performance of BSS. This is amazing!!.
>    - Thanks for acknowledging the sampling design.
>    - I agree with you on Fastlanes. In some crude experiments I didn't
>    get a good perf benefit from it on Graviton3 (but maybe there was something
>    wrong with my implementation).
>    - Locking the 16bit exception encoding for the spec in this case.
>    - Awesome I think we have solved for all open questions minus the
>    version byte :). (will get back on this soon)
>
>
> 3) Micah
>
>    - FastLanes : The current spec does allow for using FastLane with the
>    configurable enum value for layout. We should be able to inject any layout
>    in the current design.
>
>
> Working on resolving all remaining open comments on the spec this week.
>
> Best
> Prateek
>
>
> On Tue, Feb 10, 2026 at 3:37 AM Steve Loughran <[email protected]>
> wrote:
>
>> On Sun, 8 Feb 2026 at 18:12, Micah Kornfield <[email protected]>
>> wrote:
>>
>> >
>> >
>> > It looks like the actual issue described for ORC in the paper is that it
>> > has multiple sub-encodings in a batch.  This is different then the
>> design
>> > proposed here where there is still fixed encoding per page in parquet.
>> > Given reasonably sized pages I don't think branch misprediction should
>> be a
>> > big issue for new encodings.  I agree that we should be conservative in
>> > general for adding new encodings.
>> >
>> >
>> +1
>>
>

Reply via email to