mistercrunch commented on code in PR #34111:
URL: https://github.com/apache/superset/pull/34111#discussion_r2202881963


##########
superset/sql_lab.py:
##########
@@ -475,7 +475,9 @@ def execute_sql_statements(  # noqa: C901
             db.session.commit()
 
             # Hook to allow environment-specific mutation (usually comments) 
to the SQL
-            query.executed_sql = database.mutate_sql_based_on_config(block)
+            query.executed_sql = database.mutate_sql_based_on_config(
+                block, is_split=config["MUTATE_AFTER_SPLIT"]

Review Comment:
   Right. I remember some of the context now. Generally we're trying to bring 
the SQL-execution code paths back together, and there's more work to be done 
here. 
   
   I think `SQL_QUERY_MUTATOR` is a great config hook, but seems it has grown a 
bit complex. Tempting to clarify/redesign, but we have to be cautious around 
backwards compatibility. Dilemma is probably around change management here.
   
   Let's try to understand the ideal scenario here, and see if we can preserve 
backwards compatibility.
   
   First, let's start with trying to itemize common use cases for 
`SQL_QUERY_MUTATOR`:
   - #1 is clearly logging, where we add comments to all the SQL executed with 
information about the query provenance, and the user who triggered it. This 
allows database admins to get more visibility into who's behind the service 
account. Many organizations use this to parse user/tool/context information and 
use it for analytics
   - some security stuff, maybe you have complex data access policy where you 
want to intercept some things, or double check some rules, and fail queries if 
something that shouldn't be allowed is submitted, simply `raise` an exception 
from here
   - some redirection / rewiring, maybe redirecting to a similar dataset for 
things like aggregate awareness, ...
   - ?
   
   Thinking about his method and keeping it backwards compatible, one thing we 
could do is passing in more context information, like a `source` parameter that 
would inform whether this SQL comes from SQL Lab, Explore, ... Adding net new 
args to the function should be backwards compatible as long as user 
catch/disregard *args **kwargs.
   
   About the `is_split` flag, looking at the code, it looks like in explore we 
call `mutate_sql_based_on_config` both before and after the SQL is broken into 
chunks for execution. Seems this can be confusing for people authoring 
`SQL_QUERY_MUTATOR`, where you have to assume something like:
   - in explore, this method will be called twice, once with is_split=False 
first, and once more for each statement with is_split=True
   - in SQL Lab, will be called once for the block (could probably be called 
twice too following the same pattern as for explore)
   - throw in some session/env settings, stuff like calling `SET` and things 
like that. Kind of a hack but who knows what people might be using this for ...
   
   Knowing all this, it's unclear to me what to do, especially given the 
backwards-compatibility concerns.
   
   Best "mostly backwards"-compatible way forward: add a `source` args for 
clarity, call twice for SQL Lab for consistency (?) Add some notes in 
UPGRADING.md about the changes to notify upgraders.
   
   Breaking backwards compatibility? Flag `SQL_QUERY_MUTATOR` for deprecation 
and redesign the hook(s), maybe something like:
   ```
   SQL_BLOCK_MUTATOR(source)  # always called even if block has a single 
statement
   SQL_STATEMENT_MUTATOR(source) # called for each sub-statement
   ```
   
   Curious on what @betodealmeida or others might thing is best here. This 
could be slated around overdue work around bringing all SQL execution codepaths 
together, and fix/improve other related issues around code and logic 
duplication across explore and SQL Lab.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to