[
https://issues.apache.org/jira/browse/KYLIN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liyang updated KYLIN-2442:
--------------------------
Description:
Right now the expansion rate is calculated as "Cube Size / Raw Data Size". And
the raw data size is the size of intermediate hive table. This means the Raw
Data Size depends on the compression format of the intermediate table. And
affects the correctness of expansion rate and other estimates based on the raw
data size.
The change intends to calculate the Raw Data Size based on the uncompressed
cell values of the intermediate hive table. All cells take their string form
and sum up the string byte size in UTF8 encoding. The result serves as Raw Data
Size, is stable regardless of compression and other env parameters.
was:Right now the expansion rate is calculated as "Cube Size / Raw Data
Size". And the raw data size is the size of intermediate hive table. This
causes the Raw Data Size depending on the compression format of the
intermediate table. And affects the correctness of expansion rate and the
estimates based on the raw data size.
> Re-calculate expansion rate, count raw data size regardless of flat table
> compression
> -------------------------------------------------------------------------------------
>
> Key: KYLIN-2442
> URL: https://issues.apache.org/jira/browse/KYLIN-2442
> Project: Kylin
> Issue Type: Improvement
> Reporter: liyang
>
> Right now the expansion rate is calculated as "Cube Size / Raw Data Size".
> And the raw data size is the size of intermediate hive table. This means the
> Raw Data Size depends on the compression format of the intermediate table.
> And affects the correctness of expansion rate and other estimates based on
> the raw data size.
> The change intends to calculate the Raw Data Size based on the uncompressed
> cell values of the intermediate hive table. All cells take their string form
> and sum up the string byte size in UTF8 encoding. The result serves as Raw
> Data Size, is stable regardless of compression and other env parameters.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)