[ 
https://issues.apache.org/jira/browse/HBASE-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412062#comment-15412062
 ] 

ramkrishna.s.vasudevan commented on HBASE-15554:
------------------------------------------------

bq.So we will need some sort of logic in the impl method to map the incoming 
offset to correct area. Like offset 0 and 1 to return rk len, 2 - <rklel>+2 to 
return rk bytes.. like this.. Need some logic but I dont think that is going to 
be very heavy op. 
I think that is what the patch does now. We are mapping the offset that the 
hash algo calculates to the cell offset. 
bq.The Key diff what I was suggesting is instead of having duplicated methods 
in Hash, we have one which work on a HashKey (I just call it that way) and we 
have diff impl of the HashKey depending on the type of bloom and so the bytes 
it uses.
Now there are no duplicate methods.
I thin wrt Hash algos - the algo will decide the offset based on the offset and 
length that we pass. It can be anything and based on that the algo decides what 
all bytes to be retrieved and compared. 
So now the byteExtractor will use the offset and determine which offset in the 
cell has to be retrieved. 
Why I say Key interface would be helpful is 
-> one is that what ever be the cell format we should finally assume the back 
end is KV format key only. Because the offset and length that we pass to the 
hash algo is assuming that it is continuous. In future if we have a Cell where 
every part of cell is individual byte[] or BB and so every thing starts with 
offset 0 to some length then am not sure how the hash algo is going to work. 
May be we should find the hash for every components like row and col and then 
find the final hash. One thing I need to accept is that I am not sure why the 
hash algo is calculated twice - first with seed 0 and next with the first 
calcualted hash value. May be need to read the paper that talks about the algo.
-> What I thought is if we have Key interface and if we feel that the 
serialized format is not that of KV atleast we can do the copy.


> StoreFile$Writer.appendGeneralBloomFilter generates extra KV
> ------------------------------------------------------------
>
>                 Key: HBASE-15554
>                 URL: https://issues.apache.org/jira/browse/HBASE-15554
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: Vladimir Rodionov
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15554.patch, HBASE-15554_3.patch, 
> HBASE-15554_4.patch, HBASE-15554_6.patch, HBASE-15554_7.patch, 
> HBASE-15554_9.patch
>
>
> Accounts for 10% memory allocation in compaction thread when BloomFilterType 
> is ROWCOL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to