+1! š„ š¾ On Thu, 5 Jun 2025 at 11:50, Michael S. Molina <michael.s.mol...@gmail.com> wrote:
> Huge congratulations on getting SIP-117 across the finish line! š > > Itās clear this took a ton of effortāsix months, 18 PRs, and a whole lot > of careful thinkingābut the result is a major achievement. Centralizing all > SQL parsing into SQLScript and SQLStatement, with dialect-aware support and > full test coverage, is a massive win for both maintainability and security. > I really appreciate this improvement; itās one of those foundational pieces > that quietly enables so much to work safely and efficiently across > Superset. Kudos to you and everyone who contributed! > > Best regards, > Michael S. Molina > > > On 5 Jun 2025, at 15:27, Quentin Leroy <quentin.n.le...@gmail.com> > wrote: > > > > Thank you !! > > > >> On Thu, Jun 5, 2025 at 5:43āÆPM <robe...@dealmeida.net.invalid> wrote: > >> > >> Hi, all! > >> > >> I just wanted to give a heads up that SIP-117 > >> (github.com/apache/superset/issues/26786), "Improve SQL parsing", has > >> been fully implemented. We now have all the codebase using a single > >> parser library (`sqlglot`) through two new classes: `SQLScript` and > >> `SQLStatement` (a script is a sequence of statements). > >> > >> With this change, the SQL parsing in Superset is now dialect-dependent. > >> Of the 60 engines we support, 41 have dedicated dialects. Adding new > >> dialects is relatively easy, and during the work for SIP-117 I created a > >> Druid dialect (contributed upstream to `sqlglot`) and two dialects for > >> Firebolt (maintained in the Superset repo). Better yet, all SQL parsing > >> functionality is now contained in these 2 classes, with 100% test > >> coverage. If we ever need to change the parser in the future we only > >> have to modify these classes and run the test suite to make sure > >> everything still works as expected. > >> > >> The work for SIP-117 took almost 6 months, 18 PRs, and added > >> approximately 600 lines of code and 800 lines of tests. While it's easy > >> to forget that Superset even does SQL parsing, it's a critical part of > >> our codebase. For example, parsing SQL is needed in order to set (or > >> update) limits in queries, preventing too much data from being loaded > >> into the UI. And while this might seem simple, keep in mind different > >> databases have different syntaxes for it: > >> > >> SELECT * FROM t LIMIT 10 > >> SELECT TOP 10 * FROM t > >> SELECT * FROM t FETCH FIRST 10 ROWS ONLY > >> > >> More importantly, SQL parsing is critical for security. It's used to > >> identify which tables are being accessed when a query runs, so that > >> Superset can enforce data access roles (DAR). It's used to detect > >> malicious use of functions that can expose data, as well as the > >> malicious use of subqueries in ad-hoc expressions. And it's used to > >> modify arbitrarily complex queries in place, injecting row-level > >> security (RLS) filters. > >> > >> I'd like to thanks all the contributors who helped with this SIP, > >> especially Vitor Ćvila, Elizabeth Thompson, Antonio Rivero, and Max > >> Beauchemin. > >> > >> --Beto >