[
https://issues.apache.org/jira/browse/KYLIN-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442036#comment-17442036
]
Yaqian Zhang commented on KYLIN-5099:
-------------------------------------
[~tanyao] If you set parameter "kylin.cube.cubeplanner.enabled" to true, kylin4
will pruning cuboids when you build the first segment. And the degree of
pruning cuboids is controlled by parameter
"kylin.cube.cubeplanner.expansion-threshold". The smaller the value of
"kylin.cube.cubeplanner.expansion-threshold" you set, the more cuboids will be
pruned.
According to my test, when "kylin.cube.cubeplanner.expansion-threshold" be set
to 2.5, the number of cuboids after pruning in kylin4 is roughly the same as
that in kylin3.
> parquet file size is too big in kylin4 with spark3 than kylin3 with mr
> ----------------------------------------------------------------------
>
> Key: KYLIN-5099
> URL: https://issues.apache.org/jira/browse/KYLIN-5099
> Project: Kylin
> Issue Type: Bug
> Affects Versions: v4.0.0
> Reporter: tanyao
> Priority: Blocker
> Attachments: image-2021-10-15-10-43-54-830.png,
> image-2021-10-15-10-44-49-178.png, image-2021-10-18-10-00-34-440.png,
> image-2021-10-18-10-02-07-998.png, image-2021-10-18-10-06-15-839.png,
> screenshot-1.png
>
>
> hi ,
> i am trying to use spark 3.1.1 as the build engine in kylin4.0, the hive
> table has 200W+ rows with orc type, and there are 10 dimensions definded, the
> original size is about 50M.
> when i use kylin4.0 to build this cube ,the final parquet files size all
> together is 1G+,that is to say , a single segment is about 1G+. However , i
> use the same hive table data with the same cube model and dimensions , the
> hbase segment size is just 100M+. In case of kylin4.0, all spark params are
> set automatically by kylin
> why this happened? And the building time in kylin4.0 is not faster then
> kylin3.1 , even worse! both of them take about 10mins, i can not find the
> benefits about kylin4.0
> !image-2021-10-15-10-43-54-830.png!
>
> !image-2021-10-15-10-44-49-178.png!
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)