[ 
https://issues.apache.org/jira/browse/KYLIN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang updated KYLIN-2442:
--------------------------
    Description: 
Right now the expansion rate is calculated as "Cube Size / Raw Data Size". And 
the raw data size is the size of intermediate hive table. This means the Raw 
Data Size depends on the compression format of the intermediate table. And 
affects the correctness of expansion rate and other estimates based on the raw 
data size.

The change intends to calculate the Raw Data Size based on the uncompressed 
cell values of the intermediate hive table. All cells take their string form 
and sum up the string byte size in UTF8 encoding. The result serves as Raw Data 
Size, is stable regardless of compression and other env parameters.

  was:Right now the expansion rate is calculated as "Cube Size / Raw Data 
Size". And the raw data size is the size of intermediate hive table. This 
causes the Raw Data Size depending on the compression format of the 
intermediate table. And affects the correctness of expansion rate and the 
estimates based on the raw data size.


> Re-calculate expansion rate, count raw data size regardless of flat table 
> compression
> -------------------------------------------------------------------------------------
>
>                 Key: KYLIN-2442
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2442
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: liyang
>
> Right now the expansion rate is calculated as "Cube Size / Raw Data Size". 
> And the raw data size is the size of intermediate hive table. This means the 
> Raw Data Size depends on the compression format of the intermediate table. 
> And affects the correctness of expansion rate and other estimates based on 
> the raw data size.
> The change intends to calculate the Raw Data Size based on the uncompressed 
> cell values of the intermediate hive table. All cells take their string form 
> and sum up the string byte size in UTF8 encoding. The result serves as Raw 
> Data Size, is stable regardless of compression and other env parameters.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to