[
https://issues.apache.org/jira/browse/KYLIN-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027246#comment-17027246
]
ASF subversion and git services commented on KYLIN-4322:
--------------------------------------------------------
Commit 26cf1f8ed217c96329d8dcbd8a00ef1d67023fca in kylin's branch
refs/heads/master from Kang
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=26cf1f8 ]
Revert "KYLIN-4322: set storage.hbase.endpoint-compress-result default value …
(#1033)"
This reverts commit f41c6c8198e5cad295e9212c6a0047d83bd54ae2.
> Cost–benefit of compression HBase result
> ----------------------------------------
>
> Key: KYLIN-4322
> URL: https://issues.apache.org/jira/browse/KYLIN-4322
> Project: Kylin
> Issue Type: Bug
> Reporter: ZhouKang
> Assignee: ZhouKang
> Priority: Major
> Fix For: v3.1.0
>
>
> kylin.storage.hbase.endpoint-compress-result is TRUE as default.
> In our production environment, when the hbase scan result is larger than
> 200M, it will take more than 10s to compress data.
> We can find this by hbase's log:
> ||Size||avg rate||min rate||avg time||max time||
> |<1M|0.12|0.25|0.18ms|0.7s|
> |1M ~ 10M|0.39|0.97|0.2s|0.6s|
> |10M ~ 100M|0.47|0.81|2s|6.3s|
> |>100M|0.95|0.96|15.7s|24.8s|
> Notice:
> # rate: compressed data size / origin data size
> # when the source data size is < 1M, compressed data may larger than the
> source data. So the table(Row 1) only calculate then compressed data less
> than the source data
> # In our environment, 65% compression data (<1M) is larger than source data
> When source data is less then 10M, the latency of data transmission is
> acceptability. When data is larger then 100M, it will take a long time to
> compress data.
>
> So, I think kylin.storage.hbase.endpoint-compress-result should be FALSE by
> default;
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)