Hi Micah,
Le 08/02/2026 à 21:08, Micah Kornfield a écrit :
I am also toying with the idea of a encoding/decoding fuzzer that
roundtrips data (see "function/inverse pairs" in
https://blog.regehr.org/archives/856). The question becomes in which
format the fuzzer would accept input data for the encoding step (as
Parquet files, which would mean a decoding/encoding/decoding roundtrip?
as Arrow IPC files, which are a simpler format?).
Sorry for the late reply. It could also be the IPC json testing format?
It could, but that introduces more overhead. The current Parquet full
file fuzzer runs at around 100 iterations/second. Ideally a low-level
Parquet encoding fuzzer should run at least 1-2 orders of magnitude
faster so as to explore the search space more quickly.
So my current inclination is to go with a custom fixed-size struct
header indicating the physical type, encoding type and perhaps a couple
other pieces of information.
Regards
Antoine.