[jira] [Updated] (KYLIN-4888) Performance optimization of union query with spark engine

Feng Zhu (Jira) Wed, 27 Jan 2021 19:02:11 -0800


     [ 
https://issues.apache.org/jira/browse/KYLIN-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Feng Zhu updated KYLIN-4888:
----------------------------
    Description: 
when using union query with spark engine, UnionPlan transforms OLAPUnionRel to 
spark

DataFrame, when OLAPUnionRel.all = false, distinct transformation of spark will 
be used, but

it's used in a loop which traversing the DataFrame collection so that we don't 
have an excepted optimized flattenUnion plan(the CombineUnions rule of spark 
optimize the distinct, but the nested union plan does not be flattened),there 
are so many stages in spark dag.  Actuall, distinct transformation should be 
used only once at last.

  was:
when using union query with spark engine, UnionPlan transforms OLAPUnionRel to 
spark

DataFrame, when OLAPUnionRel.all = false, distinct transformation of spark will 
be used, but

it's used in a loop which traversing the DataFrame collection so that there are 
so many stages in spark dag.  Actuall, distinct transformation should be used 
only once at last.


>  Performance optimization of union query with spark engine
> ----------------------------------------------------------
>
>                 Key: KYLIN-4888
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4888
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Spark Engine
>    Affects Versions: v4.0.0-alpha
>            Reporter: Feng Zhu
>            Assignee: Feng Zhu
>            Priority: Major
>             Fix For: v4.0.0-beta
>
>
> when using union query with spark engine, UnionPlan transforms OLAPUnionRel 
> to spark
> DataFrame, when OLAPUnionRel.all = false, distinct transformation of spark 
> will be used, but
> it's used in a loop which traversing the DataFrame collection so that we 
> don't have an excepted optimized flattenUnion plan(the CombineUnions rule of 
> spark optimize the distinct, but the nested union plan does not be 
> flattened),there are so many stages in spark dag.  Actuall, distinct 
> transformation should be used only once at last.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KYLIN-4888) Performance optimization of union query with spark engine

Reply via email to