[
https://issues.apache.org/jira/browse/HADOOP-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561173#action_12561173
]
stack commented on HADOOP-2604:
-------------------------------
Bloom Filters:
+ Turns out, particularly since the change where we now have a Memcache per
Store rather than one for a whole Region, we know the number of elements we're
about to flush out to a Store file. Means we can pick an optimal bloom filter
size. Therefore, bloom filters could be enabled by default.
+ We currently provide a choice: General, Counting, and Dynamic. I do not see
where we would ever use anything but a General bloom filter (Counting adds
deletions, dynamic allows sizing). Therefore, I'd suggest we remove choice of
implementations.
+ Bloom filters are not as effective as they could be given that the most
popular lookup will be for the 'latest' version of a cell: i.e. the lookup is
not for an explicit cell -- row/column/ts -- but for the most recent version of
the cell. So, bloom filters should be populated by row/column and probably not
ts. Will have to actually fetch the cell to learn its actual ts.
Mapfile Indices:
+ If index had an entry for every row/column/ts entry in a Store file/MapFile,
we wouldn't need a bloom filter (But it would consume volumes more memory!)
+ Chatting w/ Bryan, mapfile indices could be kept in an LRU. We'd add a means
of asking a mapfile for its index. We'd shove it into an LRU or into a
Reference Map (For the latter, when memory was low, the index would be dropped
and would be refetched on next access).
> [hbase] Create an HBase-specific MapFile implementation
> -------------------------------------------------------
>
> Key: HADOOP-2604
> URL: https://issues.apache.org/jira/browse/HADOOP-2604
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Reporter: Bryan Duxbury
> Priority: Minor
>
> Today, HBase uses the Hadoop MapFile class to store data persistently to
> disk. This is convenient, as it's already done (and maintained by other
> people :). However, it's beginning to look like there might be possible
> performance benefits to be had from doing an HBase-specific implementation of
> MapFile that incorporated some precise features.
> This issue should serve as a place to track discussion about what features
> might be included in such an implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.