On Thu, 14 May 2026 at 15:25, Antoine Pitrou <[email protected]> wrote:
> > I haven't really followed Variant development, but it's extremely > reasonable for implementations to choose reasonable nesting limits (say, > 64 levels). > yes, some limit is needed. The json one is 500 so I took that. > > I would point out that we already have somehow similar limits in Parquet > C++ for Thrift decoding: > > https://github.com/apache/arrow/blob/c1036681b099c5f9b0684a710be04bb7619e926f/cpp/src/parquet/properties.h#L105-L121 > > I'll add that parsing Variants is a natural target for fuzz testing. > Less so than the compression stuff. The challenge with the variants is you don't want to be so rigorous it hurts performance. Arrow's rust parquet has on-demand strict validation. In my PR metadata content validation "monotonically increasing offsets into the data" stays as it is today, on-demand when you do a lookup. With Neelesh's cached metadata, there's only one lookup per key, which is a real performance kille right now. > >
