[ https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850170#comment-13850170 ]
Lars Hofhansl commented on HBASE-3484: -------------------------------------- >From [~mcorgan]... bq. I've been pondering how to better compact the data in the memstore. Sometimes we see a 100MB memstore flush that is really 10MB of KeyValues, which gzips to like 2MB, meaning there is a ton of pointer overhead. This should better now. In various patches I removed: * caching of the row key (HBASE-7279) * caching of the timestamp (HBASE-7279) * caching of the KV length (HBASE-9956) That saves 12 bytes + sizeOf(rowKey) for each KeyValue in the memstore. The KV in memory overhead now is: 56 bytes. (the memstoreTS is also stored in the HFiles). > Replace memstore's ConcurrentSkipListMap with our own implementation > -------------------------------------------------------------------- > > Key: HBASE-3484 > URL: https://issues.apache.org/jira/browse/HBASE-3484 > Project: HBase > Issue Type: Improvement > Components: Performance > Affects Versions: 0.92.0 > Reporter: Todd Lipcon > Priority: Critical > Attachments: WIP_HBASE-3484.patch, hierarchical-map.txt, > memstore_drag.png > > > By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements > to it for our use case in MemStore: > - add an iterator.replace() method which should allow us to do upsert much > more cheaply > - implement a Set directly without having to do Map<KeyValue,KeyValue> to > save one reference per entry > It turns out CSLM is in public domain from its development as part of JSR > 166, so we should be OK with licenses. -- This message was sent by Atlassian JIRA (v6.1.4#6159)