[ 
https://issues.apache.org/jira/browse/HBASE-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409344#comment-15409344
 ] 

Anoop Sam John commented on HBASE-15554:
----------------------------------------

Based on the discussion around functions to be added to Hash
We had  hash(byte[]) and the patch adds 
 - hash(BB) to handle Row bloom for Off heap cells
 - hash(Cell) to handle ROW_COL bloom type. To avoid need to copy row and qual 
to recreate a Cell with null CF.

So the Q is adding these variants add lots of duplicate code and it looks bit 
ugly.  So can we have only one function hash(Cell) - This solves duplicate code 
paths but it will be very ugly to pass Cell and a bloom type to hash functions. 
The hash function knowing abt the Bloom type based usage of Cell components!

Thinking on this here is one idea am proposing.
Have one function only in Hash .    hash(HashKey)
Let the HashKey be some thing like Iterable. Using that we can iterate over the 
bytes corresponding to the HashKey.  Let there be 2 impls of this HashKey one 
for ROW and another for ROW_COL type.  Each of the impl will keep ref to the 
Cells. The iterator impl knows which all bytes of Cells to be considered. ROW 
type will take row bytes only (either from byte[] or BB) ROW_COL type takes row 
bytes first followed by qual bytes.    This alone is a considerable amount of 
change and worth doing as sub task.  If u want I can do as a PoC first.  I have 
not checked wrt code wise at all.  But looks we can make it in a clean way.

> StoreFile$Writer.appendGeneralBloomFilter generates extra KV
> ------------------------------------------------------------
>
>                 Key: HBASE-15554
>                 URL: https://issues.apache.org/jira/browse/HBASE-15554
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: Vladimir Rodionov
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15554.patch, HBASE-15554_3.patch, 
> HBASE-15554_4.patch, HBASE-15554_6.patch, HBASE-15554_7.patch
>
>
> Accounts for 10% memory allocation in compaction thread when BloomFilterType 
> is ROWCOL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to