mistercrunch commented on code in PR #34111:
URL: https://github.com/apache/superset/pull/34111#discussion_r2202881963
##########
superset/sql_lab.py:
##########
@@ -475,7 +475,9 @@ def execute_sql_statements( # noqa: C901
db.session.commit()
# Hook to allow environment-specific mutation (usually comments)
to the SQL
- query.executed_sql = database.mutate_sql_based_on_config(block)
+ query.executed_sql = database.mutate_sql_based_on_config(
+ block, is_split=config["MUTATE_AFTER_SPLIT"]
Review Comment:
Right. I remember some of the context now. Generally we're trying to bring
the SQL-execution code paths back together, and there's more work to be done
here.
I think `SQL_QUERY_MUTATOR` is a great config hook, but seems it has grown a
bit complex. Tempting to clarify/redesign, but we have to be cautious around
backwards compatibility. Dilemma is probably around change management here.
Let's try to understand the ideal scenario here, and see if we can preserve
backwards compatibility.
First, let's start with trying to itemize common use cases for
`SQL_QUERY_MUTATOR`:
- #1 is clearly logging, where we add comments to all the SQL executed with
information about the query provenance, and the user who triggered it. This
allows database admins to get more visibility into who's behind the service
account. Many organizations use this to parse user/tool/context information and
use it for analytics
- some security stuff, maybe you have complex data access policy where you
want to intercept some things, or double check some rules, and fail queries if
something that shouldn't be allowed is submitted, simply `raise` an exception
from here
- some redirection / rewiring, maybe redirecting to a similar dataset for
things like aggregate awareness, ...
- ?
Thinking about his method and keeping it backwards compatible, one thing we
could do is passing in more context information, like a `source` parameter that
would inform whether this SQL comes from SQL Lab, Explore, ... Adding net new
args to the function should be backwards compatible as long as user
catch/disregard *args **kwargs.
About the `is_split` flag, looking at the code, it looks like in explore we
call `mutate_sql_based_on_config` both before and after the SQL is broken into
chunks for execution. Seems this can be confusing for people authoring
`SQL_QUERY_MUTATOR`, where you have to assume something like:
- in explore, this method will be called twice, once with is_split=False
first, and once more for each statement with is_split=True
- in SQL Lab, will be called once for the block (could probably be called
twice too following the same pattern as for explore)
- throw in some session/env settings, stuff like calling `SET` and things
like that. Kind of a hack but who knows what people might be using this for ...
Knowing all this, it's unclear to me what to do, especially given the
backwards-compatibility concerns.
Best "mostly backwards"-compatible way forward: add a `source` args for
clarity, call twice for SQL Lab for consistency (?) Add some notes in
UPGRADING.md about the changes to notify upgraders.
Breaking backwards compatibility? Flag `SQL_QUERY_MUTATOR` for deprecation
and redesign the hook(s), maybe something like:
```
SQL_BLOCK_MUTATOR(source) # always called even if block has a single
statement
SQL_STATEMENT_MUTATOR(source) # called for each sub-statement
```
Curious on what @betodealmeida or others might thing is best here. This
could be slated around overdue work around bringing all SQL execution codepaths
together, and fix/improve other related issues around code and logic
duplication across explore and SQL Lab.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]