[ 
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198091#comment-16198091
 ] 

Hudson commented on HBASE-17678:
--------------------------------

FAILURE: Integrated in Jenkins build HBase-2.0 #658 (See 
[https://builds.apache.org/job/HBase-2.0/658/])
Revert "HBASE-17678 FilterList with MUST_PASS_ONE lead to redundancy (busbey: 
rev 852b5783841aeb2a55b9ec6dd7ef37b4f2269b1c)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
Revert "HBASE-17678 FilterList with MUST_PASS_ONE may lead to redundant 
(busbey: rev 38e52bb29e173c0e47f29214618133e09cd5f96b)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java


> FilterList with MUST_PASS_ONE may lead to redundant cells returned
> ------------------------------------------------------------------
>
>                 Key: HBASE-17678
>                 URL: https://issues.apache.org/jira/browse/HBASE-17678
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Filters
>    Affects Versions: 2.0.0, 1.3.0, 1.2.1
>         Environment: RedHat 7.x
>            Reporter: Jason Tokayer
>            Assignee: Zheng Hu
>         Attachments: HBASE-17678.addendum.patch, HBASE-17678.addendum.patch, 
> HBASE-17678.branch-1.1.v1.patch, HBASE-17678.branch-1.1.v2.patch, 
> HBASE-17678.branch-1.1.v2.patch, HBASE-17678.branch-1.1.v2.patch, 
> HBASE-17678.branch-1.v1.patch, HBASE-17678.branch-1.v1.patch, 
> HBASE-17678.branch-1.v2.patch, HBASE-17678.branch-1.v2.patch, 
> HBASE-17678.v1.patch, HBASE-17678.v1.rough.patch, HBASE-17678.v2.patch, 
> HBASE-17678.v3.patch, HBASE-17678.v4.patch, HBASE-17678.v4.patch, 
> HBASE-17678.v5.patch, HBASE-17678.v6.patch, HBASE-17678.v7.patch, 
> HBASE-17678.v7.patch, TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, 
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are 
> multiple cells with the same timestamp. This is unexpected since there is 
> only a single filter in the list, and I would believe that MUST_PASS_ALL and 
> MUST_PASS_ONE should only affect the behavior of the joined filter and not 
> the behavior of any one of the individual filters. If this is not a bug then 
> it would be nice if the documentation is updated to explain this nuanced 
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep 
> multiple cells with the same timestamp. This is generally fine but presents 
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1000000000000
> put 'ns:tbl','row','family:name','Jane',1000000000000
> put 'ns:tbl','row','family:name','Gil',1000000000000
> put 'ns:tbl','row','family:name','Jane',1000000000000
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
>       val table = connection.getTable(TableName.valueOf("ns:tbl"))
>       val paginationFilter = new ColumnPaginationFilter(limit,offset)
>       val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
>       println("@ filterList = "+filterList)
>       val results = table.get(new 
> Get(Bytes.toBytes("row")).setFilter(filterList))
>       val cells = results.rawCells()
>       if (cells != null) {
>               for (cell <- cells) {
>                 val value = new String(CellUtil.cloneValue(cell))
>                 val qualifier = new String(CellUtil.cloneQualifier(cell))
>                 val family = new String(CellUtil.cloneFamily(cell))
>                 val result = "OFFSET = "+offset+":"+family + "," + qualifier 
> + "," + value + "," + cell.getTimestamp()
>                 resultsList.append(result)
>               }
>       }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> OFFSET = 1:family,name,Gil,1000000000000
> OFFSET = 2:family,name,Jane,1000000000000
> OFFSET = 3:family,name,John,1000000000000
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> OFFSET = 2:family,name,Jane,1000000000000
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but 
> MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a 
> single (not-duplicated)  within a page, but not across pages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to