[ 
https://issues.apache.org/jira/browse/HADOOP-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561173#action_12561173
 ] 

stack commented on HADOOP-2604:
-------------------------------

Bloom Filters:

+ Turns out, particularly since the change where we now have a Memcache per 
Store rather than one for a whole Region, we know the number of elements we're 
about to flush out to a Store file.  Means we can pick an optimal bloom filter 
size.  Therefore, bloom filters could be enabled by default.
+ We currently provide a choice: General, Counting, and Dynamic.  I do not see 
where we would ever use anything but a General bloom filter (Counting adds 
deletions, dynamic allows sizing).  Therefore, I'd suggest we remove choice of 
implementations.
+ Bloom filters are not as effective as they could be given that the most 
popular lookup will be for the 'latest' version of a cell: i.e. the lookup is 
not for an explicit cell -- row/column/ts -- but for the most recent version of 
the cell.  So, bloom filters should be populated by row/column and probably not 
ts.  Will have to actually fetch the cell to learn its actual ts.

Mapfile Indices:

+ If index had an entry for every row/column/ts entry in a Store file/MapFile, 
we wouldn't need a bloom filter (But it would consume volumes more memory!)
+ Chatting w/ Bryan, mapfile indices could be kept in an LRU.  We'd add a means 
of asking a mapfile for its index.  We'd shove it into an LRU or into a 
Reference Map (For the latter, when memory was low, the index would be dropped 
and would be refetched on next access).

> [hbase] Create an HBase-specific MapFile implementation
> -------------------------------------------------------
>
>                 Key: HADOOP-2604
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2604
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: Bryan Duxbury
>            Priority: Minor
>
> Today, HBase uses the Hadoop MapFile class to store data persistently to 
> disk. This is convenient, as it's already done (and maintained by other 
> people :). However, it's beginning to look like there might be possible 
> performance benefits to be had from doing an HBase-specific implementation of 
> MapFile that incorporated some precise features.
> This issue should serve as a place to track discussion about what features 
> might be included in such an implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to