Re: Fuzzing Parquet C++

Micah Kornfield Mon, 09 Feb 2026 10:14:27 -0800

>
> So my current inclination is to go with a custom fixed-size struct
> header indicating the physical type, encoding type and perhaps a couple
> other pieces of information.



Sounds good, thank you for the context.

Cheers,
Micah

On Mon, Feb 9, 2026 at 6:59 AM Steve Loughran <[email protected]> wrote:

> I saw an interesting video on this topic -was anyone at the conference?
>
> https://youtu.be/h3UcecN5fvQ?si=PlhrwMIv8s_wxAF1
>
> Antoine, given you clearly understand the topic, what exactly does the
> content at 25:30 mean (especially in terms of parquet)?
>
> FYI, the ASF Community over Code conference in Glasgow will have its CfP
> announced before long, and I think some talks on code security would be
> good. I've got a working title of one "Open Source and CVEs: the forever
> war"...
> Something on fuzzing would be really good too
>
> On Mon, 9 Feb 2026 at 09:23, Antoine Pitrou <[email protected]> wrote:
>
> >
> > Hi Micah,
> >
> > Le 08/02/2026 à 21:08, Micah Kornfield a écrit :
> > >>
> > >> I am also toying with the idea of a encoding/decoding fuzzer that
> > >> roundtrips data (see "function/inverse pairs" in
> > >> https://blog.regehr.org/archives/856). The question becomes in which
> > >> format the fuzzer would accept input data for the encoding step (as
> > >> Parquet files, which would mean a decoding/encoding/decoding
> roundtrip?
> > >> as Arrow IPC files, which are a simpler format?).
> > >
> > > Sorry for the late reply.  It could also be the IPC json testing
> format?
> >
> > It could, but that introduces more overhead. The current Parquet full
> > file fuzzer runs at around 100 iterations/second. Ideally a low-level
> > Parquet encoding fuzzer should run at least 1-2 orders of magnitude
> > faster so as to explore the search space more quickly.
> >
> > So my current inclination is to go with a custom fixed-size struct
> > header indicating the physical type, encoding type and perhaps a couple
> > other pieces of information.
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
>

Re: Fuzzing Parquet C++

Reply via email to