[
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172250#comment-14172250
]
zhangduo commented on HBASE-10201:
----------------------------------
Run the same benchmark on a 3 regionservers cluster(2 * Xeon E5-2650 2.6G, 3T *
11 sata), the result is smililar.
Without per CF flush:
metric_storeCount: 3,
metric_storeFileCount: 9,
metric_memStoreSize: 39965016,
metric_storeFileSize: 4460709275,
metric_compactionsCompletedCount: 46,
metric_numBytesCompactedCount: 11030906070,
metric_numFilesCompactedCount: 145,
Write amplification: 2.47
With per CF flush:
metric_storeCount: 3,
metric_storeFileCount: 7,
metric_memStoreSize: 110195648,
metric_storeFileSize: 4369570622,
metric_compactionsCompletedCount: 27,
metric_numBytesCompactedCount: 10353718691,
metric_numFilesCompactedCount: 89,
Write amplification: 2.37
The patch has a big impact on compactionsCompletedCount, but a small impact on
numBytesCompactedCount. This is reasonable, the patch only prevent flushing
small files of small CFs and reduce its compaction number, but most
numBytesCompactedCount is contributed by large CFs which is not effected(or at
least, very small) by this patch. So we only get a small improvement of write
amplification(5%~10%).
> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
> Key: HBASE-10201
> URL: https://issues.apache.org/jira/browse/HBASE-10201
> Project: HBase
> Issue Type: Improvement
> Reporter: Ted Yu
> Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch,
> HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch
>
>
> Currently the flush decision is made using the aggregate size of all column
> families. When large and small column families co-exist, this causes many
> small flushes of the smaller CF. We need to make per-CF flush decisions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)