Hi Micah,

Le 08/02/2026 à 21:08, Micah Kornfield a écrit :

I am also toying with the idea of a encoding/decoding fuzzer that
roundtrips data (see "function/inverse pairs" in
https://blog.regehr.org/archives/856). The question becomes in which
format the fuzzer would accept input data for the encoding step (as
Parquet files, which would mean a decoding/encoding/decoding roundtrip?
as Arrow IPC files, which are a simpler format?).

Sorry for the late reply.  It could also be the IPC json testing format?

It could, but that introduces more overhead. The current Parquet full file fuzzer runs at around 100 iterations/second. Ideally a low-level Parquet encoding fuzzer should run at least 1-2 orders of magnitude faster so as to explore the search space more quickly.

So my current inclination is to go with a custom fixed-size struct header indicating the physical type, encoding type and perhaps a couple other pieces of information.

Regards

Antoine.


Reply via email to