Re: Fuzzing Parquet C++

Antoine Pitrou Mon, 09 Feb 2026 01:23:09 -0800


Hi Micah,

Le 08/02/2026 à 21:08, Micah Kornfield a écrit :


I am also toying with the idea of a encoding/decoding fuzzer that
roundtrips data (see "function/inverse pairs" in
https://blog.regehr.org/archives/856). The question becomes in which
format the fuzzer would accept input data for the encoding step (as
Parquet files, which would mean a decoding/encoding/decoding roundtrip?
as Arrow IPC files, which are a simpler format?).


Sorry for the late reply.  It could also be the IPC json testing format?

It could, but that introduces more overhead. The current Parquet fullfile fuzzer runs at around 100 iterations/second. Ideally a low-levelParquet encoding fuzzer should run at least 1-2 orders of magnitudefaster so as to explore the search space more quickly.

So my current inclination is to go with a custom fixed-size structheader indicating the physical type, encoding type and perhaps a coupleother pieces of information.


Regards

Antoine.

Re: Fuzzing Parquet C++

Reply via email to