[jira] [Updated] (KYLIN-4322) Cost–benefit of compression HBase result

ZhouKang (Jira) Tue, 31 Dec 2019 00:57:59 -0800


     [ 
https://issues.apache.org/jira/browse/KYLIN-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ZhouKang updated KYLIN-4322:
----------------------------
    Description: 
kylin.storage.hbase.endpoint-compress-result is  TRUE as default.

In our production environment, when the hbase scan result is larger than 200M, 
it will take more than 10s to compress data.

We can find this by hbase's log:
||Size||avg rate||max rate||avg time||max time||
|<1M|0.12|0.25|0.18ms|0.7s|
|1M ~ 10M|0.39|0.97|0.2s|0.6s|
|10M ~ 100M|0.47|0.81|2s|6.3s|
|>100M|0.95|0.96|15.7s|24.8s|

Notice：
 # rate: compressed data size / origin data size
 # when the source data size is < 1M, compressed data may larger than the 
source data. So the table(Row 1) only calculate then compressed data less than 
the source data
 # In our environment, 65% compression data (<1M) is larger than source data 

When source data is less then 10M, the latency of data transmission is 
acceptability. When data is larger then 100M, it will take a long time to 
compress data.

 

So, I think kylin.storage.hbase.endpoint-compress-result  should be FALSE by 
default;

 

  was:
kylin.storage.hbase.endpoint-compress-result is  TRUE as default.

In our production environment, when the hbase scan result is larger than 200M, 
it will take more than 10s to compress data.

We can find this by hbase's log:
||Size||avg rate||max rate||avg time||max time||
|<1M|0.12|0.25|0.18ms|0.7s|
|1M ~ 10M|0.39|0.97|0.2s|0.6s|
|10M ~ 100M|0.47|0.81|2s|6.3s|
|>100M|0.95|0.96|15.7s|24.8s|

rate: compressed data size / origin data size

 AND please NOTICE that,

when the source data size is less than 1M, 65% compression data is larger than 
source data.

When source data is less then 10M, the latency of data transmission is 
acceptability. When data is larger then 100M, it will take a long time to 
compress data.

 

So, I think kylin.storage.hbase.endpoint-compress-result  should be FALSE by 
default;

 


> Cost–benefit of compression HBase result
> ----------------------------------------
>
>                 Key: KYLIN-4322
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4322
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: ZhouKang
>            Priority: Major
>
> kylin.storage.hbase.endpoint-compress-result is  TRUE as default.
> In our production environment, when the hbase scan result is larger than 
> 200M, it will take more than 10s to compress data.
> We can find this by hbase's log:
> ||Size||avg rate||max rate||avg time||max time||
> |<1M|0.12|0.25|0.18ms|0.7s|
> |1M ~ 10M|0.39|0.97|0.2s|0.6s|
> |10M ~ 100M|0.47|0.81|2s|6.3s|
> |>100M|0.95|0.96|15.7s|24.8s|
> Notice：
>  # rate: compressed data size / origin data size
>  # when the source data size is < 1M, compressed data may larger than the 
> source data. So the table(Row 1) only calculate then compressed data less 
> than the source data
>  # In our environment, 65% compression data (<1M) is larger than source data 
> When source data is less then 10M, the latency of data transmission is 
> acceptability. When data is larger then 100M, it will take a long time to 
> compress data.
>  
> So, I think kylin.storage.hbase.endpoint-compress-result  should be FALSE by 
> default;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KYLIN-4322) Cost–benefit of compression HBase result

Reply via email to