[
https://issues.apache.org/jira/browse/KYLIN-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Feng Zhu updated KYLIN-4888:
----------------------------
Description:
when using union query with spark engine, UnionPlan transforms OLAPUnionRel to
spark
DataFrame, when OLAPUnionRel.all = false, distinct transformation of spark will
be used, but
it's used in a loop which traversing the DataFrame collection so that we don't
have an excepted optimized flattenUnion plan(the CombineUnions rule of spark
optimize the distinct, but the nested union plan does not be flattened),there
are so many stages in spark dag. Actuall, distinct transformation should be
used only once at last.
was:
when using union query with spark engine, UnionPlan transforms OLAPUnionRel to
spark
DataFrame, when OLAPUnionRel.all = false, distinct transformation of spark will
be used, but
it's used in a loop which traversing the DataFrame collection so that there are
so many stages in spark dag. Actuall, distinct transformation should be used
only once at last.
> Performance optimization of union query with spark engine
> ----------------------------------------------------------
>
> Key: KYLIN-4888
> URL: https://issues.apache.org/jira/browse/KYLIN-4888
> Project: Kylin
> Issue Type: Improvement
> Components: Spark Engine
> Affects Versions: v4.0.0-alpha
> Reporter: Feng Zhu
> Assignee: Feng Zhu
> Priority: Major
> Fix For: v4.0.0-beta
>
>
> when using union query with spark engine, UnionPlan transforms OLAPUnionRel
> to spark
> DataFrame, when OLAPUnionRel.all = false, distinct transformation of spark
> will be used, but
> it's used in a loop which traversing the DataFrame collection so that we
> don't have an excepted optimized flattenUnion plan(the CombineUnions rule of
> spark optimize the distinct, but the nested union plan does not be
> flattened),there are so many stages in spark dag. Actuall, distinct
> transformation should be used only once at last.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)