[
https://issues.apache.org/jira/browse/KYLIN-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443683#comment-17443683
]
ASF GitHub Bot commented on KYLIN-5099:
---------------------------------------
hit-lacus merged pull request #1766:
URL: https://github.com/apache/kylin/pull/1766
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> parquet file size is too big in kylin4 with spark3 than kylin3 with mr
> ----------------------------------------------------------------------
>
> Key: KYLIN-5099
> URL: https://issues.apache.org/jira/browse/KYLIN-5099
> Project: Kylin
> Issue Type: Bug
> Affects Versions: v4.0.0
> Reporter: tanyao
> Priority: Blocker
> Attachments: image-2021-10-15-10-43-54-830.png,
> image-2021-10-15-10-44-49-178.png, image-2021-10-18-10-00-34-440.png,
> image-2021-10-18-10-02-07-998.png, image-2021-10-18-10-06-15-839.png,
> screenshot-1.png
>
>
> hi ,
> i am trying to use spark 3.1.1 as the build engine in kylin4.0, the hive
> table has 200W+ rows with orc type, and there are 10 dimensions definded, the
> original size is about 50M.
> when i use kylin4.0 to build this cube ,the final parquet files size all
> together is 1G+,that is to say , a single segment is about 1G+. However , i
> use the same hive table data with the same cube model and dimensions , the
> hbase segment size is just 100M+. In case of kylin4.0, all spark params are
> set automatically by kylin
> why this happened? And the building time in kylin4.0 is not faster then
> kylin3.1 , even worse! both of them take about 10mins, i can not find the
> benefits about kylin4.0
> !image-2021-10-15-10-43-54-830.png!
>
> !image-2021-10-15-10-44-49-178.png!
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)