[jira] [Updated] (PHOENIX-7397) Optimize ClientAggregatePlan/ClientScanPlan when inner query plan is UnionPlan

chenglei (Jira) Thu, 29 Aug 2024 09:19:04 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


chenglei updated PHOENIX-7397:
------------------------------
    Description: 
A common use case of union all is to union multiple homogeneous tables and then 
further aggregate or sort , just as following sql statements：
{code:java}
select A, B,C, sum(...),sum(....)
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
)
group by A,B,C
having....
{code}


{code:java}
select A, B,C, sum(...),sum(...)
from (
select a1 as A, b1 as B,c1 as C, sum(...), sum(...),....
from T1
group by a1,b1,c1
union all
select a2,b2,c2,.sum(...),sum(...)....
from T2
group by a2,b2,c2
union all
select a3,b3,c3, sum(...),sum(...)....
group by a3,b3,c3
)
group by A,B,C
having....
{code}

{code:java}
select A, B,C...
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
)
where ......
order by A,B,C
{code}
For situations similar to above three SQL queries, if each subquery of union 
all is actually ordered by A, B, and C, then {{UnionPlan}} just needs to 
perform a simple merge on the subquery results to ensure the overall order of 
the union all . However, for the current implementation of Phoenix, the outer 
{{ClientAggregatePlan}} or {{ClientScanPlan}} does not know this information 
within {{UnionPlan}} and will still perform heavyweight processing based on the 
assumption that the overall result of union all is completely unordered.


  was:
A common use case of union all is to union multiple homogeneous tables and then 
further aggregate or sort , just as following sql statements：
{code:java}
select A, B,C, sum(...),sum(....)
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
)
group by A,B,C
having....
{code}


{code:java}
select A, B,C, sum(...),sum(...)
from (
select a1 as A, b1 as B,c1 as C, sum(...), sum(...),....
from T1
group by a1,b1,c1
union all
select a2,b2,c2,.sum(...),sum(...)....
from T2
group by a2,b2,c2
union all
select a3,b3,c3, sum(...),sum(...)....
group by a3,b3,c3
)
group by A,B,C
having....
{code}

{code:java}
select A, B,C...
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
)
where ......
order by A,B,C
{code}
For situations similar to above three SQL queries, if each subquery of union 
all is actually ordered by A, B, and C, then {{UnionPlan}} just needs to 
perform a simple merge on the subquery results to ensure the overall order of 
the union all result. However, for the current implementation of Phoenix, the 
outer {{ClientAggregatePlan}} or {{ClientScanPlan}} does not know this 
information within {{UnionPlan}} and will still perform heavyweight processing 
based on the assumption that the overall result of union all is completely 
unordered.



> Optimize ClientAggregatePlan/ClientScanPlan when inner query plan is UnionPlan
> ------------------------------------------------------------------------------
>
>                 Key: PHOENIX-7397
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7397
>             Project: Phoenix
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 5.2.0, 5.1.3
>            Reporter: chenglei
>            Priority: Major
>
> A common use case of union all is to union multiple homogeneous tables and 
> then further aggregate or sort , just as following sql statements：
> {code:java}
> select A, B,C, sum(...),sum(....)
> from (
> select a1 as A, b1 as B,c1 as C, ....
> from T1
> union all
> select a2,b2,c2,.....
> from T2
> union all
> select a3,b3,c3, ....
> )
> group by A,B,C
> having....
> {code}
> {code:java}
> select A, B,C, sum(...),sum(...)
> from (
> select a1 as A, b1 as B,c1 as C, sum(...), sum(...),....
> from T1
> group by a1,b1,c1
> union all
> select a2,b2,c2,.sum(...),sum(...)....
> from T2
> group by a2,b2,c2
> union all
> select a3,b3,c3, sum(...),sum(...)....
> group by a3,b3,c3
> )
> group by A,B,C
> having....
> {code}
> {code:java}
> select A, B,C...
> from (
> select a1 as A, b1 as B,c1 as C, ....
> from T1
> union all
> select a2,b2,c2,.....
> from T2
> union all
> select a3,b3,c3, ....
> )
> where ......
> order by A,B,C
> {code}
> For situations similar to above three SQL queries, if each subquery of union 
> all is actually ordered by A, B, and C, then {{UnionPlan}} just needs to 
> perform a simple merge on the subquery results to ensure the overall order of 
> the union all . However, for the current implementation of Phoenix, the outer 
> {{ClientAggregatePlan}} or {{ClientScanPlan}} does not know this information 
> within {{UnionPlan}} and will still perform heavyweight processing based on 
> the assumption that the overall result of union all is completely unordered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-7397) Optimize ClientAggregatePlan/ClientScanPlan when inner query plan is UnionPlan

Reply via email to