[
https://issues.apache.org/jira/browse/KYLIN-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010215#comment-17010215
]
weibin0516 edited comment on KYLIN-4321 at 1/8/20 12:49 AM:
------------------------------------------------------------
Past experience and a large amount of test data show that Spark's performance
is significantly better than Hive(MapReduce).
The following pictures are the test result of spark and hive on tpc-ds
!screenshot-2.png!
!screenshot-1.png!
Currently, when the cube is built with the spark engine, the `Create fact
distinct columns` step uses mapreduce by default. Here we want to use the spark
engine to perform this step by default, that is, modify the`
kylin.engine.spark-fact-distinct` value to true.
was (Author: codingforfun):
Past experience and a large amount of test data show that Spark's performance
is significantly better than Hive(MapReduce).
!screenshot-2.png!
!screenshot-1.png!
Currently, when the cube is built with the spark engine, the `Create fact
distinct columns` step uses mapreduce by default. Here we want to use the spark
engine to perform this step by default, that is, modify the`
kylin.engine.spark-fact-distinct` value to true.
> Create fact distinct columns using spark by default when build engine is spark
> ------------------------------------------------------------------------------
>
> Key: KYLIN-4321
> URL: https://issues.apache.org/jira/browse/KYLIN-4321
> Project: Kylin
> Issue Type: Improvement
> Reporter: weibin0516
> Assignee: weibin0516
> Priority: Major
> Fix For: v3.1.0
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)