[jira] [Updated] (PHOENIX-7397) Optimize ClientAggregatePlan/ClientScanPlan when inner query is UnionPlan

chenglei (Jira) Thu, 19 Sep 2024 19:36:09 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


chenglei updated PHOENIX-7397:
------------------------------
    Description: 
A common use case of union all is to union multiple homogeneous tables and then 
further aggregate or sort , just as following sql statements：
{code:java}
select A, B,C, sum(...),sum(....)
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
from T3
)
group by A,B,C
having....
{code}


{code:java}
select A, B,C, sum(...),sum(...)
from (
select a1 as A, b1 as B,c1 as C, sum(...), sum(...),....
from T1
group by a1,b1,c1
union all
select a2,b2,c2,.sum(...),sum(...)....
from T2
group by a2,b2,c2
union all
select a3,b3,c3, sum(...),sum(...)....
from T3
group by a3,b3,c3
)
group by A,B,C
having....
{code}

{code:java}
select A, B,C...
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
from T3
)
where ......
order by A,B,C
{code}
For situations similar to above three SQL queries, if each subquery of union 
all is actually ordered by A, B, and C, then {{UnionPlan}} just needs to 
perform a simple merge on the subquery results to ensure the overall order of 
the union all . However, for the current implementation of Phoenix, the outer 
{{ClientAggregatePlan}} or {{ClientScanPlan}} does not know this information 
within {{UnionPlan}} and will still perform heavyweight processing based on the 
assumption that the overall result of union all is completely unordered.


  was:
A common use case of union all is to union multiple homogeneous tables and then 
further aggregate or sort , just as following sql statements：
{code:java}
select A, B,C, sum(...),sum(....)
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
)
group by A,B,C
having....
{code}


{code:java}
select A, B,C, sum(...),sum(...)
from (
select a1 as A, b1 as B,c1 as C, sum(...), sum(...),....
from T1
group by a1,b1,c1
union all
select a2,b2,c2,.sum(...),sum(...)....
from T2
group by a2,b2,c2
union all
select a3,b3,c3, sum(...),sum(...)....
group by a3,b3,c3
)
group by A,B,C
having....
{code}

{code:java}
select A, B,C...
from (
select a1 as A, b1 as B,c1 as C, ....
from T1
union all
select a2,b2,c2,.....
from T2
union all
select a3,b3,c3, ....
)
where ......
order by A,B,C
{code}
For situations similar to above three SQL queries, if each subquery of union 
all is actually ordered by A, B, and C, then {{UnionPlan}} just needs to 
perform a simple merge on the subquery results to ensure the overall order of 
the union all . However, for the current implementation of Phoenix, the outer 
{{ClientAggregatePlan}} or {{ClientScanPlan}} does not know this information 
within {{UnionPlan}} and will still perform heavyweight processing based on the 
assumption that the overall result of union all is completely unordered.



> Optimize ClientAggregatePlan/ClientScanPlan when inner query is UnionPlan
> -------------------------------------------------------------------------
>
>                 Key: PHOENIX-7397
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7397
>             Project: Phoenix
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 5.2.0, 5.1.3
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>             Fix For: 5.2.1, 5.3.0
>
>
> A common use case of union all is to union multiple homogeneous tables and 
> then further aggregate or sort , just as following sql statements：
> {code:java}
> select A, B,C, sum(...),sum(....)
> from (
> select a1 as A, b1 as B,c1 as C, ....
> from T1
> union all
> select a2,b2,c2,.....
> from T2
> union all
> select a3,b3,c3, ....
> from T3
> )
> group by A,B,C
> having....
> {code}
> {code:java}
> select A, B,C, sum(...),sum(...)
> from (
> select a1 as A, b1 as B,c1 as C, sum(...), sum(...),....
> from T1
> group by a1,b1,c1
> union all
> select a2,b2,c2,.sum(...),sum(...)....
> from T2
> group by a2,b2,c2
> union all
> select a3,b3,c3, sum(...),sum(...)....
> from T3
> group by a3,b3,c3
> )
> group by A,B,C
> having....
> {code}
> {code:java}
> select A, B,C...
> from (
> select a1 as A, b1 as B,c1 as C, ....
> from T1
> union all
> select a2,b2,c2,.....
> from T2
> union all
> select a3,b3,c3, ....
> from T3
> )
> where ......
> order by A,B,C
> {code}
> For situations similar to above three SQL queries, if each subquery of union 
> all is actually ordered by A, B, and C, then {{UnionPlan}} just needs to 
> perform a simple merge on the subquery results to ensure the overall order of 
> the union all . However, for the current implementation of Phoenix, the outer 
> {{ClientAggregatePlan}} or {{ClientScanPlan}} does not know this information 
> within {{UnionPlan}} and will still perform heavyweight processing based on 
> the assumption that the overall result of union all is completely unordered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-7397) Optimize ClientAggregatePlan/ClientScanPlan when inner query is UnionPlan

Reply via email to