Re: [I] [SIP-115] Parsing SQL expression block for virtual datasets into separate statements [superset]

via GitHub Fri, 25 Jul 2025 08:14:25 -0700


u35253 commented on issue #26646:
URL: https://github.com/apache/superset/issues/26646#issuecomment-3118487517


   ### Example use case
   
   I'll give an example that would be useful here, if multi-statement "virutal 
(custom sql) Datasets" were to be supported in Superset.
   
   - **Problem statement:** On my database provider, the way my Superset is set 
up: CTEs that query the exact same immediately upstream CTE will re-run the 
whole logic of their source CTE.  It recomputes the same exact data each time 
after the first.  Database caching exists only for the whole-query level if it 
was already run, but actually does not help for this.  E.g., adding 2 CTEs that 
both process the "main" CTE (which, at times, can be very convenient), will 
then TRIPLE the runtime, whether it was a "first" or "cached" run overall.  
What was a 5 second query becomes 15 seconds; a 10 second query becomes 30 
seconds; a 30 second query becomes 1.5 minutes; a 1 minute query becomes a 3 
minute query.  What was once "tolerable" either for development or for actual 
use becomes "less so".
   
   - **Proposed solution:** Allow multi-query virtual datasets.  That way, the 
First Statement could run a CACHE TABLE statement one time in my database's 
dialect  This writes the "main CTE" result to disk, making a disk cache.  Then, 
when the next two CTEs consume from that object (in the Second Statement), they 
could read the tiny, pre-computed dataset.  That would allow the "full query" 
to run about as long as it takes to run just the "main" CTE (e.g., no 
multiplier is applied to the runtime).
   
   **Commentary:** Overall, in this example, without multi-statement Dataset 
support, the current approach tends toward the direction of "make the whole 
query run fast no matter what, so that tripling it is not perceptible", among 
other possibilities.  I accept that.  But, there could be an opportunity for 
certain use cases where multi-statement datasets could be very, very convenient.
   
   This example does not get into the implementation needs in the Superset 
code, much less for the support of 50+ datasource types, or overall 
requirements.  Commenting here so that this pattern can be considered when this 
Issue gets reviewed again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [SIP-115] Parsing SQL expression block for virtual datasets into separate statements [superset]

Reply via email to