[
https://issues.apache.org/jira/browse/CALCITE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810169#comment-17810169
]
Julian Hyde edited comment on CALCITE-6188 at 1/24/24 2:41 AM:
---------------------------------------------------------------
Thank you, [~zabetak]! I knew Hive had such a syntax, but I wasn't able to find
it. Your example would translate to
{code:java}
WITH
empDelta AS (
SELECT * FROM emp2
EXCEPT
SELECT * FROM emp)
MULTI
insert1 AS (
INSERT INTO TABLE tbl1
SELECT *
FROM empDelta
WHERE empno > 100),
insert2 AS (
INSERT INTO TABLE tbl2
SELECT *
FROM empDelta
WHERE empno < 50);
{code}
Mutations occurring inside the {{MULTI}} clause would cause a single row to be
returned with a row count and status. Thus the whole statement might return
{noformat}
+-------------------+-------------------+
| insert1 | insert2 |
+--------+----------+--------+----------+
| status | rowCount | status | rowCount |
+--------+----------+--------+----------+
| ok | 10 | ok | 25 |
+--------+----------+--------+----------+
1 row returned.
{noformat}
was (Author: julianhyde):
Thank you, [~zabetak]! I knew Hive had such a syntax, but I wasn't able to find
it. Your example would translate to
{code:java}
WITH
empDelta AS (
SELECT * FROM emp2
EXCEPT
SELECT * FROM emp)
MULTI
insert1 AS (
INSERT INTO TABLE tbl1 SELECT * WHERE empno > 100),
insert2 AS (
INSERT INTO TABLE tbl2 SELECT * WHERE empno < 50);
{code}
Mutations occurring inside the {{MULTI}} clause would cause a single row to be
returned with a row count and status. Thus the whole statement might return
{noformat}
+-------------------+-------------------+
| insert1 | insert 2 |
+--------+----------+--------+----------+
| status | rowCount | status | rowCount |
+--------+----------+--------+----------+
| ok | 10 | ok | 25 |
+--------+----------+--------+----------+
1 row returned.
{noformat}
> Multi-query optimization
> ------------------------
>
> Key: CALCITE-6188
> URL: https://issues.apache.org/jira/browse/CALCITE-6188
> Project: Calcite
> Issue Type: Improvement
> Reporter: Julian Hyde
> Priority: Major
>
> Devise extensions to SQL for multi-query optimization (MQO). Queries with
> multiple input tables, multiple intermediate tables, and multiple output data
> sets (result sets and DML operations) can be defined in one SQL statement,
> optimized, and executed atomically.
> There are many flavors of multi-query optimization, depending on whether each
> occurrence of "multiple" in the previous paragraph is replaced with 0, 1 or
> "several". Our goal, here, is to allow them all to be expressed. We can then
> devise planning strategies that work for particular flavors.
> Examples of multi-queries:
> * {*}Multiple DML outputs{*}. An INSERT statement that writes into a table
> but also updates an index,
> * {*}Multiple DML outputs, complex intermediate tables{*}. A DAG that
> represents an ETL/ELT job;
> * {*}Multiple query outputs{*}. A query that produces several data sets (say
> a list of invoices for
> orders and a list of products that need to be restocked);
> * {*}DAG query{*}. A query that uses intermediate results more than once.
> See discussion in the [Multi-query optimization email
> thread|https://lists.apache.org/thread/mcdqwrtpx0os54t2nn9vtk17spkp5o5k].
> Here are some SQL examples.
> We add a new keyword {{MULTI}} that represents a statement whose output
> contains multiple data sets and DML operations, each with a unique name. For
> intermediate results, we use the existing {{WITH}} clause.
> h3. 1. Multi DML
> Read from one or more tables, write to one or more tables.
> An example is inserting into a table and also an index on that table
> (represented as a sorted table).
> {code:sql}
> WITH
> empDelta AS (
> SELECT * FROM emp2
> EXCEPT
> SELECT * FROM emp)
> MULTI
> insertEmp AS (
> INSERT INTO emp
> TABLE empDelta),
> insertEmpDeptno AS (
> MERGE empDeptno AS e
> USING TABLE empDelta AS d
> ON e.deptno = d.deptno
> WHEN NOT MATCHED THEN INSERT VALUES (deptno));
> {code}
> h3. 2. Query that creates temporary table and uses it more than once
> {code:sql}
> WITH
> temp AS (
> SELECT *
> FROM emp AS e
> JOIN dept USING (deptno)
> WHERE e.job = 'MANAGER'
> OR d.location = 'CHICAGO')
> SELECT deptno,
> (SELECT AVG(sal)
> FROM temp AS t
> WHERE t.deptno = e.deptno) AS deptAvgSal,
> (SELECT AVG(sal)
> FROM temp AS t
> WHERE t.job = e.job) AS jobAvgSal
> FROM e
> WHERE e.deptno IN (10, 20);
> {code}
> h3. 3. Query whose optimal plan might use a temporary table
> This query produces the same result as the previous query. There is a common
> relational expression, so the optimizer should consider a DAG plan with a
> reified intermediate result.
> {code:sql}
> SELECT deptno,
> (SELECT AVG(e2.sal)
> FROM emp AS e2
> JOIN dept AS d USING (deptno)
> WHERE (e2.job = 'MANAGER'
> OR d.location = 'CHICAGO')
> AND e2.deptno = e.deptno) AS deptAvgSal,
> (SELECT AVG(e3.sal)
> FROM emp AS e3
> JOIN dept AS d USING (deptno)
> WHERE (e3.job = 'MANAGER'
> OR d.location = 'CHICAGO')
> AND e3.job = e.job) AS jobAvgSal
> FROM e
> WHERE e.deptno IN (10, 20);
> {code}
> h3. 4. Query that produces several data sets
> {code:sql}
> WITH
> newOrders AS (
> SELECT *
> FROM orders
> WHERE orderDate > DATE '2023-01-25')
> MULTI
> invoices AS (
> SELECT customerName, SUM(amount)
> FROM newOrders
> GROUP BY customerName),
> restock AS (
> SELECT productId
> FROM inventory
> WHERE productId IN (
> SELECT productId FROM newOrders)
> AND itemsOnHand < 10);
> {code}
> h3. 5. Query with a complex DAG, multiple output data sets and one DML
> {code:sql}
> WITH
> t0 AS (
> SELECT * FROM t WHERE x > 5),
> t00 AS (
> SELECT * FROM t0 WHERE y < 10),
> t000 AS (
> SELECT x, MIN(y) FROM t00 GROUP BY x),
> t1 AS (
> SELECT * FROM t WHERE y > 3),
> t10 AS (
> SELECT * FROM t1 WHERE x < 8),
> t2 AS (
> SELECT * FROM t000
> INTERSECT
> SELECT * FROM t10),
> t3 AS (
> SELECT * FROM u
> INTERSECT
> SELECT * FROM t00)
> MULTI
> q0 AS (SELECT * FROM t3),
> q1 AS (SELECT * FROM t2),
> d0 AS (DELETE FROM v WHERE v.x IN (SELECT x FROM t1));
> {code}
> The data flow is the following DAG:
> {noformat}
> u --------------------+
> +--------------> t3 (q0)
> +--> t0 --> t00 --+
> | +--> t000 --+
> t --+ +--> t2 (q1)
> | +--> t10 -----------+
> +--> t1 --+
> +----------------------> d0
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)