[ 
https://issues.apache.org/jira/browse/KYLIN-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444246#comment-17444246
 ] 

tanyao commented on KYLIN-5099:
-------------------------------

[~zhangyaqian] thanks very very very much!!! when i set the property 
"kylin.cube.cubeplanner.expansion-threshold" with a smaller value , 2.5 for 
example, the final number of cubeids decreased from 256 to 80, and the final 
segment size decreased to 30M , this is very helpful . And I found an article 
about how kylin3 prunes cubeids, [Cuboid Pruning Optimization In Kylin_CN - 
Cuboid Pruning Optimization In Kylin_CN - Apache Software 
Foundation.|https://cwiki.apache.org/confluence/display/KYLIN/Cuboid+Pruning+Optimization+In+Kylin_CN]
 

> parquet file size is too big in kylin4 with spark3 than kylin3 with mr
> ----------------------------------------------------------------------
>
>                 Key: KYLIN-5099
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5099
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: v4.0.0
>            Reporter: tanyao
>            Priority: Blocker
>         Attachments: image-2021-10-15-10-43-54-830.png, 
> image-2021-10-15-10-44-49-178.png, image-2021-10-18-10-00-34-440.png, 
> image-2021-10-18-10-02-07-998.png, image-2021-10-18-10-06-15-839.png, 
> screenshot-1.png
>
>
> hi ,
>  i am trying to use spark 3.1.1 as the build engine in kylin4.0, the hive 
> table has 200W+ rows with orc type, and there are 10 dimensions definded, the 
> original size is about 50M.
> when i use kylin4.0 to build this cube ,the final parquet files size all 
> together  is 1G+,that is to say , a single segment  is about 1G+. However , i 
> use the same hive table data with the same cube model and dimensions , the 
> hbase segment size is just 100M+. In case of kylin4.0, all spark params are 
> set automatically by kylin
> why this happened? And the building time in kylin4.0 is not faster then 
> kylin3.1 , even worse! both of them take about 10mins, i can not find the 
> benefits about kylin4.0
> !image-2021-10-15-10-43-54-830.png!
>  
> !image-2021-10-15-10-44-49-178.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to