[ 
https://issues.apache.org/jira/browse/HBASE-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412234#comment-15412234
 ] 

Anoop Sam John commented on HBASE-15554:
----------------------------------------

Am sorry if I was not saying it clear. I dont mean still patch is having 
duplicate. What I mean is when I say Iterator based HashKey, I wanted it to be 
single structure we use with Hash rather than byte[]/BB/Cell..  But if the algo 
demands an offset based byte getter am fine.
bq.one is that what ever be the cell format we should finally assume the back 
end is KV format key only. Because the offset and length that we pass to the 
hash algo is assuming that it is continuous
Why we need pass an offset to hash() function?  We need pass HashKey. 
Internally the impl of HashKey has to know which byte to be returned when 
getters are called on it. Ya if u dont have iterator model u will have get(int) 
which return byte.  So the Hash functions has to call get() based on relative 
offset eg: get(0), get(1) etc.  Not like cur way of offset+1, offset+2.   When 
the impl gets these calls, it has to convert it into absolute offsets.  It is 
not that simple in ROW_COL case.  Here based on the coming offset you have it 
map it which area of the Cell this belongs also. That is what I was trying to 
say.  When get(0) or get(1) is called, those comes in rkLen part.  get(2) -  
get(<rkLen>+2)   these belong to rk bytes.    So will have to deal some sort of 
math.  So you really dont have to assume that the Cell is of KV serialization.  
 Just like in the past which all bytes of Cell where , continue to use those.  
Am I making it clear now?   It would be good if we can remove any sort of KV 
assumption from the code path.  I think it is pending only in this Bloom area.

> StoreFile$Writer.appendGeneralBloomFilter generates extra KV
> ------------------------------------------------------------
>
>                 Key: HBASE-15554
>                 URL: https://issues.apache.org/jira/browse/HBASE-15554
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: Vladimir Rodionov
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15554.patch, HBASE-15554_10.patch, 
> HBASE-15554_3.patch, HBASE-15554_4.patch, HBASE-15554_6.patch, 
> HBASE-15554_7.patch, HBASE-15554_9.patch
>
>
> Accounts for 10% memory allocation in compaction thread when BloomFilterType 
> is ROWCOL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to