[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

zhangduo (JIRA) Wed, 15 Oct 2014 03:46:30 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172250#comment-14172250
 ]


zhangduo commented on HBASE-10201:
----------------------------------

Run the same benchmark on a 3 regionservers cluster(2 * Xeon E5-2650 2.6G, 3T * 
11 sata), the result is smililar.

Without per CF flush:
metric_storeCount: 3,
metric_storeFileCount: 9,
metric_memStoreSize: 39965016,
metric_storeFileSize: 4460709275,
metric_compactionsCompletedCount: 46,
metric_numBytesCompactedCount: 11030906070,
metric_numFilesCompactedCount: 145,
Write amplification: 2.47

With per CF flush:
metric_storeCount: 3,
metric_storeFileCount: 7,
metric_memStoreSize: 110195648,
metric_storeFileSize: 4369570622,
metric_compactionsCompletedCount: 27,
metric_numBytesCompactedCount: 10353718691,
metric_numFilesCompactedCount: 89,
Write amplification: 2.37

The patch has a big impact on compactionsCompletedCount, but a small impact on 
numBytesCompactedCount. This is reasonable, the patch only prevent flushing 
small files of small CFs and reduce its compaction number, but most 
numBytesCompactedCount is contributed by large CFs which is not effected(or at 
least, very small) by this patch. So we only get a small improvement of write 
amplification(5%~10%).


> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
>                 Key: HBASE-10201
>                 URL: https://issues.apache.org/jira/browse/HBASE-10201
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ted Yu
>         Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, 
> HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch
>
>
> Currently the flush decision is made using the aggregate size of all column 
> families. When large and small column families co-exist, this causes many 
> small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk

Reply via email to