[
https://issues.apache.org/jira/browse/KYLIN-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429132#comment-17429132
]
Yaqian Zhang commented on KYLIN-5099:
-------------------------------------
Hi:
Have you enabled the cube planner?
I guess the size difference between kylin3 and kylin4 is so big because there
are many more cuboids built in kylin4 than in kylin3, because the cube planner
pruning logic in kylin4 is different from that in kylin3.
If you enabled the cube planner, you can check the difference in the number of
cuboids on the cube planner page to confirm whether kylin4 has more cuboids
than kylin3.
> parquet file size is too big in kylin4 with spark3 than kylin3 with mr
> ----------------------------------------------------------------------
>
> Key: KYLIN-5099
> URL: https://issues.apache.org/jira/browse/KYLIN-5099
> Project: Kylin
> Issue Type: Bug
> Affects Versions: v4.0.0
> Reporter: tanyao
> Priority: Blocker
> Attachments: image-2021-10-15-10-43-54-830.png,
> image-2021-10-15-10-44-49-178.png
>
>
> hi ,
> i am trying to use spark 3.1.1 as the build engine in kylin4.0, the hive
> table has 200W+ rows with orc type, and there are 10 dimensions definded, the
> original size is about 50M.
> when i use kylin4.0 to build this cube ,the final parquet files size all
> together is 1G+,that is to say , a single segment is about 1G+. However , i
> use the same hive table data with the same cube model and dimensions , the
> hbase segment size is just 100M+. In case of kylin4.0, all spark params are
> set automatically by kylin
> why this happened? And the building time in kylin4.0 is not faster then
> kylin3.1 , even worse! both of them take about 10mins, i can not find the
> benefits about kylin4.0
> !image-2021-10-15-10-43-54-830.png!
>
> !image-2021-10-15-10-44-49-178.png!
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)