> > So my current inclination is to go with a custom fixed-size struct > header indicating the physical type, encoding type and perhaps a couple > other pieces of information.
Sounds good, thank you for the context. Cheers, Micah On Mon, Feb 9, 2026 at 6:59 AM Steve Loughran <[email protected]> wrote: > I saw an interesting video on this topic -was anyone at the conference? > > https://youtu.be/h3UcecN5fvQ?si=PlhrwMIv8s_wxAF1 > > Antoine, given you clearly understand the topic, what exactly does the > content at 25:30 mean (especially in terms of parquet)? > > FYI, the ASF Community over Code conference in Glasgow will have its CfP > announced before long, and I think some talks on code security would be > good. I've got a working title of one "Open Source and CVEs: the forever > war"... > Something on fuzzing would be really good too > > On Mon, 9 Feb 2026 at 09:23, Antoine Pitrou <[email protected]> wrote: > > > > > Hi Micah, > > > > Le 08/02/2026 à 21:08, Micah Kornfield a écrit : > > >> > > >> I am also toying with the idea of a encoding/decoding fuzzer that > > >> roundtrips data (see "function/inverse pairs" in > > >> https://blog.regehr.org/archives/856). The question becomes in which > > >> format the fuzzer would accept input data for the encoding step (as > > >> Parquet files, which would mean a decoding/encoding/decoding > roundtrip? > > >> as Arrow IPC files, which are a simpler format?). > > > > > > Sorry for the late reply. It could also be the IPC json testing > format? > > > > It could, but that introduces more overhead. The current Parquet full > > file fuzzer runs at around 100 iterations/second. Ideally a low-level > > Parquet encoding fuzzer should run at least 1-2 orders of magnitude > > faster so as to explore the search space more quickly. > > > > So my current inclination is to go with a custom fixed-size struct > > header indicating the physical type, encoding type and perhaps a couple > > other pieces of information. > > > > Regards > > > > Antoine. > > > > > > >
