[DONE] [SIP-117] Improve SQL parsing

roberto Thu, 05 Jun 2025 08:44:36 -0700

Hi, all!

I just wanted to give a heads up that SIP-117(github.com/apache/superset/issues/26786), "Improve SQL parsing", hasbeen fully implemented. We now have all the codebase using a singleparser library (`sqlglot`) through two new classes: `SQLScript` and`SQLStatement` (a script is a sequence of statements).

With this change, the SQL parsing in Superset is now dialect-dependent.Of the 60 engines we support, 41 have dedicated dialects. Adding newdialects is relatively easy, and during the work for SIP-117 I created aDruid dialect (contributed upstream to `sqlglot`) and two dialects forFirebolt (maintained in the Superset repo). Better yet, all SQL parsingfunctionality is now contained in these 2 classes, with 100% testcoverage. If we ever need to change the parser in the future we onlyhave to modify these classes and run the test suite to make sureeverything still works as expected.

The work for SIP-117 took almost 6 months, 18 PRs, and addedapproximately 600 lines of code and 800 lines of tests. While it's easyto forget that Superset even does SQL parsing, it's a critical part ofour codebase. For example, parsing SQL is needed in order to set (orupdate) limits in queries, preventing too much data from being loadedinto the UI. And while this might seem simple, keep in mind differentdatabases have different syntaxes for it:


    SELECT * FROM t LIMIT 10
    SELECT TOP 10 * FROM t
    SELECT * FROM t FETCH FIRST 10 ROWS ONLY

More importantly, SQL parsing is critical for security. It's used toidentify which tables are being accessed when a query runs, so thatSuperset can enforce data access roles (DAR). It's used to detectmalicious use of functions that can expose data, as well as themalicious use of subqueries in ad-hoc expressions. And it's used tomodify arbitrarily complex queries in place, injecting row-levelsecurity (RLS) filters.

I'd like to thanks all the contributors who helped with this SIP,especially Vitor Ávila, Elizabeth Thompson, Antonio Rivero, and MaxBeauchemin.


--Beto

[DONE] [SIP-117] Improve SQL parsing

Reply via email to