[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jiajia updated HBASE-11144:
------------------------------

    Description: 
HBase is quite efficient when scanning only one small row key range. If user 
needs to specify multiple row key ranges in one scan, the typical solutions 
are: 1. through FilterList which is a list of row key Filters, 2. using the SQL 
layer over HBase to join with two table, such as hive, phoenix etc. However, 
both solutions are inefficient. Both of them can’t utilize the range info to 
perform fast forwarding during scan. Thus, all rows are scanned, which is quite 
time consuming. If the number of ranges are quite big (e.g. millions), join is 
a proper solution though it is slow. However, there are cases that user wants 
to specify a small number of ranges to scan (e.g. <1000 ranges). Both solutions 
can’t provide satisfactory performance in such case. 
We provide this filter (MultiRowRangeFilter) to support such use case (scan 
multiple row key ranges), which can construct the row key ranges from user 
specified sorted list and perform fast-forwarding during scan to skip unwanted 
rows. Thus, the scan will be quite efficient. 

  was:
Provide a filter feature to support scan multiple row key ranges. It can 
construct the row key ranges from the passed list which can be accessed by each 
region server. 



> Filter to support scan multiple row key ranges
> ----------------------------------------------
>
>                 Key: HBASE-11144
>                 URL: https://issues.apache.org/jira/browse/HBASE-11144
>             Project: HBase
>          Issue Type: Improvement
>          Components: Filters
>            Reporter: Li Jiajia
>         Attachments: MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch
>
>
> HBase is quite efficient when scanning only one small row key range. If user 
> needs to specify multiple row key ranges in one scan, the typical solutions 
> are: 1. through FilterList which is a list of row key Filters, 2. using the 
> SQL layer over HBase to join with two table, such as hive, phoenix etc. 
> However, both solutions are inefficient. Both of them can’t utilize the range 
> info to perform fast forwarding during scan. Thus, all rows are scanned, 
> which is quite time consuming. If the number of ranges are quite big (e.g. 
> millions), join is a proper solution though it is slow. However, there are 
> cases that user wants to specify a small number of ranges to scan (e.g. <1000 
> ranges). Both solutions can’t provide satisfactory performance in such case. 
> We provide this filter (MultiRowRangeFilter) to support such use case (scan 
> multiple row key ranges), which can construct the row key ranges from user 
> specified sorted list and perform fast-forwarding during scan to skip 
> unwanted rows. Thus, the scan will be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to