[
https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439311#comment-13439311
]
Anil Gupta commented on HBASE-6618:
-----------------------------------
Hi Alex,
I agree with you idea of RangeBased Fuzzy Filter. However, I would like to take
a phased approach in developing this:
In your proposal, the user can provide multiple fuzzy ranges in a single scan.
i.e. <any 4 bytes><any 6 bytes value between "_0001" and "0099"><any 3
bytes><any 4 bytes value between "_001" and "_099">
Instead of the above, IMO lets try to make a filter for "<any 4 bytes><any 6
bytes value between "_0001" and "0099"><any 3 bytes>" or "<any 4 bytes><any 6
bytes value between "_0001" and "0099">". Once we develop this then we can
enhance it to use multiple fuzzy ranges. This is just my thought/approach of
developing this. Let me know your opinion.
>From this week, at work I had to shift focus from HBase to Hive and HCatalog
>for another POC. So, I'll be squeezing time for this JIRA out of work
>schedule. I'll start looking into the current implementation of FuzzyRowFilter
>to get idea about implementation.
Thanks,
Anil Gupta
Software Engineer II, Intuit, Inc
> Implement FuzzyRowFilter with ranges support
> --------------------------------------------
>
> Key: HBASE-6618
> URL: https://issues.apache.org/jira/browse/HBASE-6618
> Project: HBase
> Issue Type: New Feature
> Components: filters
> Reporter: Alex Baranau
> Priority: Minor
>
> Apart from current ability to specify fuzzy row filter e.g. for
> <userId_actionId> format as ????_0004 (where 0004 - actionId) it would be
> great to also have ability to specify the "fuzzy range" , e.g. ????_0004,
> ..., ????_0099.
> See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65
> Note: currently it is possible to provide multiple fuzzy row rules to
> existing FuzzyRowFilter, but in case when the range is big (contains
> thousands of values) it is not efficient.
> Filter should perform efficient fast-forwarding during the scan (this is what
> distinguishes it from regex row filter).
> While such functionality may seem like a proper fit for custom filter (i.e.
> not including into standard filter set) it looks like the filter may be very
> re-useable. We may judge based on the implementation that will hopefully be
> added.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira