[
https://issues.apache.org/jira/browse/HBASE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicolas Spiegelberg updated HBASE-1200:
---------------------------------------
Summary: Add bloomfilters (was: Add bloomfilters; use
dynamicbloomfilter instead of base bloomfilter)
Description: Add bloomfiltering to hfile. Can be enabled on a family-level
basis. Ability to configure a row vs row+col level bloom. We size the
bloomfilter with the number of entries we are about to flush which seems like
usually we'd be making a filter too big, so our implementation needs to take
that into account. (was: Add bloomfiltering to hfile. Should it be optional
or on always? Currently, we bloom filter rows only, not the column + ts
component, which seems good place to start but we size the bloomfilter with the
number of entries we are about to flush which seems like usually we'd be making
a filter too big. How to figure how many rows in the flush? We should use
the DynamicBloomFilter as Andrezj does up in hadoop BloomFilterMapFile. Start
small and let it resize as entries are added.)
updating the title & description text. Note that I took out DynamicBloomFilter
requirement. I will send out a document to compliment the code fix, talking
about the implementation reasoning and possible future alternatives.
> Add bloomfilters
> ----------------
>
> Key: HBASE-1200
> URL: https://issues.apache.org/jira/browse/HBASE-1200
> Project: Hadoop HBase
> Issue Type: Task
> Reporter: stack
> Assignee: Nicolas Spiegelberg
> Fix For: 0.21.0
>
> Attachments: ryan_bloomfilter.patch
>
>
> Add bloomfiltering to hfile. Can be enabled on a family-level basis.
> Ability to configure a row vs row+col level bloom. We size the bloomfilter
> with the number of entries we are about to flush which seems like usually
> we'd be making a filter too big, so our implementation needs to take that
> into account.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.