[
https://issues.apache.org/jira/browse/HBASE-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412062#comment-15412062
]
ramkrishna.s.vasudevan commented on HBASE-15554:
------------------------------------------------
bq.So we will need some sort of logic in the impl method to map the incoming
offset to correct area. Like offset 0 and 1 to return rk len, 2 - <rklel>+2 to
return rk bytes.. like this.. Need some logic but I dont think that is going to
be very heavy op.
I think that is what the patch does now. We are mapping the offset that the
hash algo calculates to the cell offset.
bq.The Key diff what I was suggesting is instead of having duplicated methods
in Hash, we have one which work on a HashKey (I just call it that way) and we
have diff impl of the HashKey depending on the type of bloom and so the bytes
it uses.
Now there are no duplicate methods.
I thin wrt Hash algos - the algo will decide the offset based on the offset and
length that we pass. It can be anything and based on that the algo decides what
all bytes to be retrieved and compared.
So now the byteExtractor will use the offset and determine which offset in the
cell has to be retrieved.
Why I say Key interface would be helpful is
-> one is that what ever be the cell format we should finally assume the back
end is KV format key only. Because the offset and length that we pass to the
hash algo is assuming that it is continuous. In future if we have a Cell where
every part of cell is individual byte[] or BB and so every thing starts with
offset 0 to some length then am not sure how the hash algo is going to work.
May be we should find the hash for every components like row and col and then
find the final hash. One thing I need to accept is that I am not sure why the
hash algo is calculated twice - first with seed 0 and next with the first
calcualted hash value. May be need to read the paper that talks about the algo.
-> What I thought is if we have Key interface and if we feel that the
serialized format is not that of KV atleast we can do the copy.
> StoreFile$Writer.appendGeneralBloomFilter generates extra KV
> ------------------------------------------------------------
>
> Key: HBASE-15554
> URL: https://issues.apache.org/jira/browse/HBASE-15554
> Project: HBase
> Issue Type: Sub-task
> Components: Performance
> Reporter: Vladimir Rodionov
> Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-15554.patch, HBASE-15554_3.patch,
> HBASE-15554_4.patch, HBASE-15554_6.patch, HBASE-15554_7.patch,
> HBASE-15554_9.patch
>
>
> Accounts for 10% memory allocation in compaction thread when BloomFilterType
> is ROWCOL.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)