Hi, all!

I just wanted to give a heads up that SIP-117 (github.com/apache/superset/issues/26786), "Improve SQL parsing", has been fully implemented. We now have all the codebase using a single parser library (`sqlglot`) through two new classes: `SQLScript` and `SQLStatement` (a script is a sequence of statements).

With this change, the SQL parsing in Superset is now dialect-dependent. Of the 60 engines we support, 41 have dedicated dialects. Adding new dialects is relatively easy, and during the work for SIP-117 I created a Druid dialect (contributed upstream to `sqlglot`) and two dialects for Firebolt (maintained in the Superset repo). Better yet, all SQL parsing functionality is now contained in these 2 classes, with 100% test coverage. If we ever need to change the parser in the future we only have to modify these classes and run the test suite to make sure everything still works as expected.

The work for SIP-117 took almost 6 months, 18 PRs, and added approximately 600 lines of code and 800 lines of tests. While it's easy to forget that Superset even does SQL parsing, it's a critical part of our codebase. For example, parsing SQL is needed in order to set (or update) limits in queries, preventing too much data from being loaded into the UI. And while this might seem simple, keep in mind different databases have different syntaxes for it:

    SELECT * FROM t LIMIT 10
    SELECT TOP 10 * FROM t
    SELECT * FROM t FETCH FIRST 10 ROWS ONLY

More importantly, SQL parsing is critical for security. It's used to identify which tables are being accessed when a query runs, so that Superset can enforce data access roles (DAR). It's used to detect malicious use of functions that can expose data, as well as the malicious use of subqueries in ad-hoc expressions. And it's used to modify arbitrarily complex queries in place, injecting row-level security (RLS) filters.

I'd like to thanks all the contributors who helped with this SIP, especially Vitor Ávila, Elizabeth Thompson, Antonio Rivero, and Max Beauchemin.

--Beto

Reply via email to