[
https://issues.apache.org/jira/browse/HBASE-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278186#comment-15278186
]
ramkrishna.s.vasudevan commented on HBASE-15554:
------------------------------------------------
bq.It can avoid copy when the entire Key is in one buffer only right? The
KeyValue way?
Exactly. So mark those Cells which wil have the buffer under only structure
either in byte[] or BB. So we will have KeyValue, OffheapKeyvalue,
OnheapDecodedCell and OffheapDecodedcell. All these are our internal cell
representations. We can mark these cells for a special interface type and add
getKeyArray and getKeyBuffer() APIs in them. KeyValue already has getBuffer()
which we thought of deprecating but in write flow such as this we will need
them.
In the bloom filter addition try to see if the cell is of the new interface
type if so just retrieve the key part from the cell directly and ensure that
the Hash alogs/Bloom filter interfaces have APIs to deal with ByteBuffers. With
that we can avoid this copy.
But the catch is that we cannot write the blooms without the CF name. We have
to write it with the CF name. And hence I was suggesting that a major
compaction would be needed. If we see this information as redundant then we
cannot do anything without copying the Key part to a byte[] and doing as
attached in the first patch here.
> StoreFile$Writer.appendGeneralBloomFilter generates extra KV
> ------------------------------------------------------------
>
> Key: HBASE-15554
> URL: https://issues.apache.org/jira/browse/HBASE-15554
> Project: HBase
> Issue Type: Sub-task
> Components: Performance
> Reporter: Vladimir Rodionov
> Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-15554.patch
>
>
> Accounts for 10% memory allocation in compaction thread when BloomFilterType
> is ROWCOL.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)