I saw an interesting video on this topic -was anyone at the conference? https://youtu.be/h3UcecN5fvQ?si=PlhrwMIv8s_wxAF1
Antoine, given you clearly understand the topic, what exactly does the content at 25:30 mean (especially in terms of parquet)? FYI, the ASF Community over Code conference in Glasgow will have its CfP announced before long, and I think some talks on code security would be good. I've got a working title of one "Open Source and CVEs: the forever war"... Something on fuzzing would be really good too On Mon, 9 Feb 2026 at 09:23, Antoine Pitrou <[email protected]> wrote: > > Hi Micah, > > Le 08/02/2026 à 21:08, Micah Kornfield a écrit : > >> > >> I am also toying with the idea of a encoding/decoding fuzzer that > >> roundtrips data (see "function/inverse pairs" in > >> https://blog.regehr.org/archives/856). The question becomes in which > >> format the fuzzer would accept input data for the encoding step (as > >> Parquet files, which would mean a decoding/encoding/decoding roundtrip? > >> as Arrow IPC files, which are a simpler format?). > > > > Sorry for the late reply. It could also be the IPC json testing format? > > It could, but that introduces more overhead. The current Parquet full > file fuzzer runs at around 100 iterations/second. Ideally a low-level > Parquet encoding fuzzer should run at least 1-2 orders of magnitude > faster so as to explore the search space more quickly. > > So my current inclination is to go with a custom fixed-size struct > header indicating the physical type, encoding type and perhaps a couple > other pieces of information. > > Regards > > Antoine. > > >
