[
https://issues.apache.org/jira/browse/CALCITE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803376#comment-17803376
]
Jay Narale edited comment on CALCITE-6188 at 1/5/24 1:15 AM:
-------------------------------------------------------------
Not sure if it is the same but currently we treat the Sequence node strictly as
a logical entity, which has the information of the DAG. We then manage the flow
of the DAG through our scheduler. This approach eliminates the necessity for a
physical operator to handle or maintain consistency and logical transformations
can be applied independently
was (Author: jaynarale):
Not sure if it is the same but currently we treat the Sequence node strictly as
a logical entity, which has the information of the DAG. We then manage the flow
of the Directed Acyclic Graph (DAG) through our scheduler. This approach
eliminates the necessity for a physical operator to handle or maintain
consistency and logical transformations can be applied independently
> Multi-query optimization
> ------------------------
>
> Key: CALCITE-6188
> URL: https://issues.apache.org/jira/browse/CALCITE-6188
> Project: Calcite
> Issue Type: Improvement
> Reporter: Julian Hyde
> Priority: Major
>
> Devise extensions to SQL so that queries with multiple input tables, multiple
> intermediate tables, and multiple output data sets (result sets and DML
> operations) can be defined in one SQL statement, optimized, and executed
> atomically.
> There are many flavors of multi-query optimization, depending on whether each
> occurrence of "multiple" in the previous paragraph is replaced with 0, 1 or
> "several". Our goal, here, is to allow them all to be expressed. We can then
> devise planning strategies that work for particular flavors.
> Examples of multi-queries:
> * {*}Multiple DML outputs{*}. An INSERT statement that writes into a table
> but also updates an index,
> * {*}Multiple DML outputs, complex intermediate tables{*}. A DAG that
> represents an ETL/ELT job;
> * {*}Multiple query outputs{*}. A query that produces several data sets (say
> a list of invoices for
> orders and a list of products that need to be restocked);
> * {*}DAG query{*}. A query that uses intermediate results more than once.
> See discussion in the [Multi-query optimization email
> thread|https://lists.apache.org/thread/mcdqwrtpx0os54t2nn9vtk17spkp5o5k].
> Here are some SQL examples.
> We add a new keyword {{MULTI}} that represents a statement whose output
> contains multiple data sets and DML operations, each with a unique name. For
> intermediate results, we use the existing {{WITH}} clause.
> h3. 1. Multi DML
> Read from one or more tables, write to one or more tables.
> An example is inserting into a table and also an index on that table
> (represented as a sorted table).
> {code:sql}
> WITH
> empDelta AS (
> SELECT * FROM emp2
> EXCEPT
> SELECT * FROM emp)
> MULTI
> insertEmp AS (
> INSERT INTO emp
> TABLE empDelta),
> insertEmpDeptno AS (
> MERGE empDeptno AS e
> USING TABLE empDelta AS d
> ON e.deptno = d.deptno
> WHEN NOT MATCHED THEN INSERT VALUES (deptno));
> {code}
> h3. 2. Query that creates temporary table and uses it more than once
> {code:sql}
> WITH
> temp AS (
> SELECT *
> FROM emp AS e
> JOIN dept USING (deptno)
> WHERE e.job = 'MANAGER'
> OR d.location = 'CHICAGO')
> SELECT deptno,
> (SELECT AVG(sal)
> FROM temp AS t
> WHERE t.deptno = e.deptno) AS deptAvgSal,
> (SELECT AVG(sal)
> FROM temp AS t
> WHERE t.job = e.job) AS jobAvgSal
> FROM e
> WHERE e.deptno IN (10, 20);
> {code}
> h3. 3. Query whose optimal plan might use a temporary table
> This query produces the same result as the previous query. There is a common
> relational expression, so the optimizer should consider a DAG plan with a
> reified intermediate result.
> {code:sql}
> SELECT deptno,
> (SELECT AVG(e2.sal)
> FROM emp AS e2
> JOIN dept AS d USING (deptno)
> WHERE (e2.job = 'MANAGER'
> OR d.location = 'CHICAGO')
> AND e2.deptno = e.deptno) AS deptAvgSal,
> (SELECT AVG(e3.sal)
> FROM emp AS e3
> JOIN dept AS d USING (deptno)
> WHERE (e3.job = 'MANAGER'
> OR d.location = 'CHICAGO')
> AND e3.job = e.job) AS jobAvgSal
> FROM e
> WHERE e.deptno IN (10, 20);
> {code}
> h3. 4. Query that produces several data sets
> {code:sql}
> WITH
> newOrders AS (
> SELECT *
> FROM orders
> WHERE orderDate > DATE '2023-01-25')
> MULTI
> invoices AS (
> SELECT customerName, SUM(amount)
> FROM newOrders
> GROUP BY customerName),
> restock AS (
> SELECT productId
> FROM inventory
> WHERE productId IN (
> SELECT productId FROM newOrders)
> AND itemsOnHand < 10);
> {code}
> h3. 5. Query with a complex DAG, multiple output data sets and one DML
> {code:sql}
> WITH
> t0 AS (
> SELECT * FROM t WHERE x > 5),
> t00 AS (
> SELECT * FROM t0 WHERE y < 10),
> t000 AS (
> SELECT x, MIN(y) FROM t00 GROUP BY x),
> t1 AS (
> SELECT * FROM t WHERE y > 3),
> t10 AS (
> SELECT * FROM t1 WHERE x < 8),
> t2 AS (
> SELECT * FROM t000
> INTERSECT
> SELECT * FROM t10),
> t3 AS (
> SELECT * FROM u
> INTERSECT
> SELECT * FROM t00)
> MULTI
> q0 AS (SELECT * FROM t3),
> q1 AS (SELECT * FROM t2),
> d0 AS (DELETE FROM v WHERE v.x IN (SELECT x FROM t1));
> {code}
> The data flow is the following DAG:
> {noformat}
> u --------------------+
> +--------------> t3 (q0)
> +--> t0 --> t00 --+
> | +--> t000 --+
> t --+ +--> t2 (q1)
> | +--> t10 -----------+
> +--> t1 --+
> +----------------------> d0
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)