[
https://issues.apache.org/jira/browse/KYLIN-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280428#comment-17280428
]
ASF GitHub Bot commented on KYLIN-4888:
---------------------------------------
hit-lacus merged pull request #1562:
URL: https://github.com/apache/kylin/pull/1562
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Performance optimization of union query with spark engine
> ----------------------------------------------------------
>
> Key: KYLIN-4888
> URL: https://issues.apache.org/jira/browse/KYLIN-4888
> Project: Kylin
> Issue Type: Improvement
> Components: Spark Engine
> Affects Versions: v4.0.0-alpha
> Reporter: Feng Zhu
> Assignee: Feng Zhu
> Priority: Major
> Fix For: v4.0.0-GA
>
> Attachments: spark_union_plan_comparison, stages before.png,
> stages_after.png
>
>
> when using union query with spark engine, UnionPlan transforms OLAPUnionRel
> to spark
> DataFrame, when OLAPUnionRel.all = false, distinct transformation of spark
> will be used, but
> it's used in a loop which traversing the DataFrame collection so that we
> don't have an excepted optimized flattenUnion plan(the CombineUnions rule of
> spark optimize the distinct, but the nested union plan does not be
> flattened),there are so many stages in spark dag. Actuall, distinct
> transformation should be used only once at last.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)