[ https://issues.apache.org/jira/browse/KYLIN-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014986#comment-17014986 ]
ASF subversion and git services commented on KYLIN-4322: -------------------------------------------------------- Commit f41c6c8198e5cad295e9212c6a0047d83bd54ae2 in kylin's branch refs/heads/master from Kang [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=f41c6c8 ] KYLIN-4322: set storage.hbase.endpoint-compress-result default value … (#1033) * KYLIN-4322: set storage.hbase.endpoint-compress-result default value false * KYLIN-4322: update UT > Cost–benefit of compression HBase result > ---------------------------------------- > > Key: KYLIN-4322 > URL: https://issues.apache.org/jira/browse/KYLIN-4322 > Project: Kylin > Issue Type: Bug > Reporter: ZhouKang > Priority: Major > > kylin.storage.hbase.endpoint-compress-result is TRUE as default. > In our production environment, when the hbase scan result is larger than > 200M, it will take more than 10s to compress data. > We can find this by hbase's log: > ||Size||avg rate||min rate||avg time||max time|| > |<1M|0.12|0.25|0.18ms|0.7s| > |1M ~ 10M|0.39|0.97|0.2s|0.6s| > |10M ~ 100M|0.47|0.81|2s|6.3s| > |>100M|0.95|0.96|15.7s|24.8s| > Notice: > # rate: compressed data size / origin data size > # when the source data size is < 1M, compressed data may larger than the > source data. So the table(Row 1) only calculate then compressed data less > than the source data > # In our environment, 65% compression data (<1M) is larger than source data > When source data is less then 10M, the latency of data transmission is > acceptability. When data is larger then 100M, it will take a long time to > compress data. > > So, I think kylin.storage.hbase.endpoint-compress-result should be FALSE by > default; > -- This message was sent by Atlassian Jira (v8.3.4#803005)