[
https://issues.apache.org/jira/browse/SPARK-42551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wan Kun updated SPARK-42551:
----------------------------
Description:
h1. *Design Sketch*
* Get all common expressions from input expressions. Recursively visits all
subexpressions regardless of whether the current expression is a conditional
expression.
* For each common expression:
* Add a new boolean variable *subExprInit_n* to indicate whether we have
already evaluated the common expression, and reset it to *false* at the start
of operator.consume()
* Add a new wrapper subExpr function for common subexpression.
{code:java}
private void subExpr_n(${argList.mkString(", ")}) {
if (!subExprInit_n) {
${eval.code}
subExprInit_n = true;
subExprIsNull_n = ${eval.isNull};
subExprValue_n = ${eval.value};
}
}
{code}
* Replace all the common subexpression with the wrapper function
*subExpr_n(argList)*.
h1. *New support subexpression elimination patterns*
*
h2. *Support subexpression elimination with conditional expressions*
{code:java}
SELECT case when v + 2 > 1 then 1
when v + 1 > 2 then 2
when v + 1 > 3 then 3 END vv
FROM values(1) as t2(v)
{code}
We can reuse the result of expression *v + 1*
{code:java}
SELECT a, max(if(a > 0, b + c, null)) max_bc, min(if(a > 1, b + c, null)) min_bc
FROM values(1, 1, 1) as t(a, b, c)
GROUP BY a
{code}
We can reuse the result of expression b + c
*
h2. *Support subexpression elimination in FilterExec*
{code:java}
SELECT * FROM (
SELECT v * v + 1 v1 from values(1) as t2(v)
) t
where v1 > 5 and v1 < 10
{code}
We can reuse the result of expression *v* * *v* *+* *1*
*
h2. *Support subexpression elimination in JoinExec*
{code:java}
SELECT *
FROM values(1, 1) as t1(a, b)
join values(1, 2) as t2(x, y)
ON b * y between 2 and 3{code}
We can reuse the result of expression *b* * *y*
*
h2. *Support subexpression elimination in ExpandExec*
{code:java}
SELECT a, count(b),
count(distinct case when b > 1 then b + c else null end) as count_bc_1,
count(distinct case when b < 0 then b + c else null end) as count_bc_2
FROM values(1, 1, 1) as t(a, b, c)
GROUP BY a
{code}
We can reuse the result of expression b + c
was:
h1. *Design Sketch*
* Get all common expressions from input expressions. Recursively visits all
subexpressions regardless of whether the current expression is a conditional
expression.
* For each common expression:
* Add a new boolean variable *subExprInit_n* to indicate whether we have
already evaluated the common expression, and reset it to *false* at the start
of operator.consume()
* Add a new wrapper subExpr function for common subexpression, and replace all
the common subexpression with the wrapper function.
{code:java}
private void subExpr_n(${argList.mkString(", ")}) {
if (!subExprInit_n) {
${eval.code}
subExprInit_n = true;
subExprIsNull_n = ${eval.isNull};
subExprValue_n = ${eval.value};
}
}
{code}
h1. *New support subexpression elimination patterns*
*
h2. *Support subexpression elimination with conditional expressions*
{code:java}
SELECT case when v + 2 > 1 then 1
when v + 1 > 2 then 2
when v + 1 > 3 then 3 END vv
FROM values(1) as t2(v)
{code}
We can reuse the result of expression *v + 1*
{code:java}
SELECT a, max(if(a > 0, b + c, null)) max_bc, min(if(a > 1, b + c, null)) min_bc
FROM values(1, 1, 1) as t(a, b, c)
GROUP BY a
{code}
We can reuse the result of expression b + c
*
h2. *Support subexpression elimination in FilterExec*
{code:java}
SELECT * FROM (
SELECT v * v + 1 v1 from values(1) as t2(v)
) t
where v1 > 5 and v1 < 10
{code}
We can reuse the result of expression *v* * *v* *+* *1*
*
h2. *Support subexpression elimination in JoinExec*
{code:java}
SELECT *
FROM values(1, 1) as t1(a, b)
join values(1, 2) as t2(x, y)
ON b * y between 2 and 3{code}
We can reuse the result of expression *b* * *y*
*
h2. *Support subexpression elimination in ExpandExec*
{code:java}
SELECT a, count(b),
count(distinct case when b > 1 then b + c else null end) as count_bc_1,
count(distinct case when b < 0 then b + c else null end) as count_bc_2
FROM values(1, 1, 1) as t(a, b, c)
GROUP BY a
{code}
We can reuse the result of expression b + c
> Support more subexpression elimination cases
> --------------------------------------------
>
> Key: SPARK-42551
> URL: https://issues.apache.org/jira/browse/SPARK-42551
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.3.2
> Reporter: Wan Kun
> Priority: Major
>
> h1. *Design Sketch*
> * Get all common expressions from input expressions. Recursively visits all
> subexpressions regardless of whether the current expression is a conditional
> expression.
> * For each common expression:
> * Add a new boolean variable *subExprInit_n* to indicate whether we have
> already evaluated the common expression, and reset it to *false* at the start
> of operator.consume()
> * Add a new wrapper subExpr function for common subexpression.
> {code:java}
> private void subExpr_n(${argList.mkString(", ")}) {
> if (!subExprInit_n) {
> ${eval.code}
> subExprInit_n = true;
> subExprIsNull_n = ${eval.isNull};
> subExprValue_n = ${eval.value};
> }
> }
> {code}
> * Replace all the common subexpression with the wrapper function
> *subExpr_n(argList)*.
> h1. *New support subexpression elimination patterns*
> *
> h2. *Support subexpression elimination with conditional expressions*
> {code:java}
> SELECT case when v + 2 > 1 then 1
> when v + 1 > 2 then 2
> when v + 1 > 3 then 3 END vv
> FROM values(1) as t2(v)
> {code}
> We can reuse the result of expression *v + 1*
> {code:java}
> SELECT a, max(if(a > 0, b + c, null)) max_bc, min(if(a > 1, b + c, null))
> min_bc
> FROM values(1, 1, 1) as t(a, b, c)
> GROUP BY a
> {code}
> We can reuse the result of expression b + c
> *
> h2. *Support subexpression elimination in FilterExec*
>
> {code:java}
> SELECT * FROM (
> SELECT v * v + 1 v1 from values(1) as t2(v)
> ) t
> where v1 > 5 and v1 < 10
> {code}
> We can reuse the result of expression *v* * *v* *+* *1*
> *
> h2. *Support subexpression elimination in JoinExec*
>
> {code:java}
> SELECT *
> FROM values(1, 1) as t1(a, b)
> join values(1, 2) as t2(x, y)
> ON b * y between 2 and 3{code}
>
> We can reuse the result of expression *b* * *y*
> *
> h2. *Support subexpression elimination in ExpandExec*
> {code:java}
> SELECT a, count(b),
> count(distinct case when b > 1 then b + c else null end) as count_bc_1,
> count(distinct case when b < 0 then b + c else null end) as count_bc_2
> FROM values(1, 1, 1) as t(a, b, c)
> GROUP BY a
> {code}
> We can reuse the result of expression b + c
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]