masumi-ryugo commented on issue #21998: URL: https://github.com/apache/datafusion/issues/21998#issuecomment-4378382425
Thanks @alamb and @comphead. @alamb — your "fuzz can use only public APIs / doesn't need to live in this repo" framing is exactly the lighter-touch model I'd default to, so that's helpful to hear. @comphead — agreed that SQL-aware query generation (SQLsmith-style — generate thousands of valid-ish queries covering joins, built-ins, etc., then assert parser/analyzer don't panic and meet basic perf bounds) is more valuable per-input than byte-level on a mature SQL surface. I'd treat it as complementary to byte-level corpus mutation rather than a replacement, since they fail in different ways: a grammar generator finds bugs in the *combinations* the grammar can express, and a corpus mutator finds bugs in malformed/edge bytes near the corpus. But they're different enough harnesses that I think they're best authored by separate people — I'm not the right person for the grammar-generator one (no SQLsmith experience), so I'll scope my own follow-through to the byte-level side only. Given the two narrower options I floated: **Going to start with (a) — wrap `datafusion-sqlparser-rs/fuzz` under OSS-Fuzz.** It's the least intrusive: zero code change in this repo or in `datafusion-sqlparser-rs`, just a new `projects/datafusion-sqlparser-rs/` directory in `google/oss-fuzz` that drives the existing honggfuzz harness. Concrete next step before any PR: I'll open a small issue on `apache/datafusion-sqlparser-rs` asking for `primary_contact` + `auto_ccs` Google-account emails (OSS-Fuzz needs them for crash notifications and there's no PMC alias path — same prerequisite I hit on `apache/arrow-rs#5332`). Once I have those I can send the `google/oss-fuzz` PR; it's mechanical. I'll cross-link both back here. **Parking (b) — corpus-mutation harness in `datafusion/sql/tests`.** Won't touch this repo's code without an explicit go from @2010YOUY01, since this one does require an in-repo crate and that's the part you sounded skeptical about. Happy to drop it entirely if you'd rather, or revisit once (a) has produced (or not produced) anything interesting to bring back. No code action from me until the contact-email issue on `datafusion-sqlparser-rs` lands. Will report back when that's filed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
