[ 
https://issues.apache.org/jira/browse/HBASE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207257#comment-16207257
 ] 

Zheng Hu edited comment on HBASE-18368 at 10/17/17 10:24 AM:
-------------------------------------------------------------

I uploaded an image to describe the next row behavior in RegionScanner and 
StoreScanner .

https://issues.apache.org/jira/secure/attachment/12892568/next-row-behavior-in-regionScanner-and-storeScanner.jpg

Assume that there are two column families: cf1, cf2 . For StoreScanner, the 
NEXT_ROW return code will skip to the next row in current familly cf1 (As the 
red line shows). But for RegionScanner,  StoreScanner of cf1 will skip to the 
next row in family cf1, and our storeHeap in RegionScanner will choose the 
minimal store scanner which will be familly cf2 to read the next cell. So for 
RegionScanner, it actually do two steps: skip to the next row in family cf1, 
and switch our storescanner in regionScanner to cf2, that's the reason why we 
can optimize FamillyFilter by NEXT_ROW returncode (As the blue line shows).

Here, we can define the NEXT_ROW return code more clearly: In CF-level, 
NEXT_ROW will skip to the next row in current familly, and In Region-level, 
NEXT_ROW will skip to the next row in current family and switch to the next 
family for RegionScanner. 

So patch for this issue will be easy: 

1. Make the NEXT_ROW definition more clear in JavaDoc. 
2. Keep behavior match the definition of NEXT_ROW. 


was (Author: openinx):
I uploaded an image to describe the next row behavior in RegionScanner and 
StoreScanner .

!next-row-behavior-in-regionScanner-and-storeScanner.jpg|thumbnail! 

Assume that there are two column families: cf1, cf2 . For StoreScanner, the 
NEXT_ROW return code will skip to the next row in current familly cf1 (As the 
red line shows). But for RegionScanner,  StoreScanner of cf1 will skip to the 
next row in family cf1, and our storeHeap in RegionScanner will choose the 
minimal store scanner which will be familly cf2 to read the next cell. So for 
RegionScanner, it actually do two steps: skip to the next row in family cf1, 
and switch our storescanner in regionScanner to cf2, that's the reason why we 
can optimize FamillyFilter by NEXT_ROW returncode (As the blue line shows).

Here, we can define the NEXT_ROW return code more clearly: In CF-level, 
NEXT_ROW will skip to the next row in current familly, and In Region-level, 
NEXT_ROW will skip to the next row in current family and switch to the next 
family for RegionScanner. 

So patch for this issue will be easy: 

1. Make the NEXT_ROW definition more clear in JavaDoc. 
2. Keep behavior match the definition of NEXT_ROW. 

> FilterList with multiple FamilyFilters concatenated by OR does not work.
> ------------------------------------------------------------------------
>
>                 Key: HBASE-18368
>                 URL: https://issues.apache.org/jira/browse/HBASE-18368
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Filters
>    Affects Versions: 3.0.0, 2.0.0-alpha-1
>            Reporter: Peter Somogyi
>            Assignee: Zheng Hu
>            Priority: Critical
>         Attachments: HBASE-18368.branch-1.patch, 
> HBASE-18368.branch-1.v2.patch, HBASE-18368.branch-1.v3.patch, 
> HBASE-18368.patch, HBASE-18368.v2.patch, HBASE-18368.v3.patch, 
> HBASE-18368.v3.patch, next-row-behavior-in-regionScanner-and-storeScanner.jpg
>
>
> Scan gives back incomplete list if multiple filters are combined with OR / 
> MUST_PASS_ONE.
> Using 2 FamilyFilters in a FilterList using MUST_PASS_ONE operator will give 
> back results for only the first Filter.
> {code:java|title=Test code}
>   @Test
>   public void testFiltersWithOr() throws Exception {
>     TableName tn = TableName.valueOf("MyTest");
>     Table table = utility.createTable(tn, new String[] {"cf1", "cf2"});
>     byte[] CF1 = Bytes.toBytes("cf1");
>     byte[] CF2 = Bytes.toBytes("cf2");
>     Put put1 = new Put(Bytes.toBytes("0"));
>     put1.addColumn(CF1, Bytes.toBytes("col_a"), Bytes.toBytes(0));
>     table.put(put1);
>     Put put2 = new Put(Bytes.toBytes("0"));
>     put2.addColumn(CF2, Bytes.toBytes("col_b"), Bytes.toBytes(0));
>     table.put(put2);
>     FamilyFilter filterCF1 = new FamilyFilter(CompareFilter.CompareOp.EQUAL, 
> new BinaryComparator(CF1));
>     FamilyFilter filterCF2 = new FamilyFilter(CompareFilter.CompareOp.EQUAL, 
> new BinaryComparator(CF2));
>     FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE);
>     filterList.addFilter(filterCF1);
>     filterList.addFilter(filterCF2);
>     Scan scan = new Scan();
>     scan.setFilter(filterList);
>     ResultScanner scanner = table.getScanner(scan);
>     System.out.println(filterList);
>     for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
>       System.out.println(rr);
>     }
>   }
> {code}
> {noformat:title=Output}
> FilterList OR (2/2): [FamilyFilter (EQUAL, cf1), FamilyFilter (EQUAL, cf2)]
> keyvalues={0/cf1:col_a/1499852754957/Put/vlen=4/seqid=0}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to