[
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036735#comment-16036735
]
Zheng Hu edited comment on HBASE-17678 at 6/5/17 9:50 AM:
----------------------------------------------------------
[~zghaobac], I created a mock filter to test whether the cell passed to
filter in filter list is the expected cell ( patch v4). and found some
problems in FilterList.java:
1. FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like
INCLUDE_AND_SEEK_NEXT_ROW is a newly added state, and the dev forgot to
consider FilterList), So if dev use INCLUDE_AND_SEEK_NEXT_ROW in his own Filter
and wrapped by a FilterList, it'll throw IllegalStateException("Received
code is not valid.").
2. For FilterList with MUST_PASS_ONE, if filter-A in filter list return
INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL, the
FilterList will return INCLUDE_AND_NEXT_COL finally. According to the mininal
step rule , It's incorrect. (filter list with MUST_PASS_ONE choose the mininal
step among filters in filter list. Let's call it: The Mininal Step Rule).
I opened another issue HBASE-18160 for above problems, and let's fix this issue
first.
was (Author: openinx):
[~zghaobac], I created a mock filter to test whether cell pass to filter in
filter list is the expected cell ( patch v4). and found some problems in
FilterList.java:
1. FilterList did not consider INCLUDE_AND_SEEK_NEXT_ROW case( seems like
INCLUDE_AND_SEEK_NEXT_ROW is a newly added state, and the dev forgot to
consider FilterList), So if dev use INCLUDE_AND_SEEK_NEXT_ROW in his own Filter
and wrapped by a FilterList, it'll throw IllegalStateException("Received
code is not valid.").
2. For FilterList with MUST_PASS_ONE, if filter-A in filter list return
INCLUDE and filter-B in filter list return INCLUDE_AND_NEXT_COL, the
FilterList will return INCLUDE_AND_NEXT_COL finally. According to the mininal
step rule , It's incorrect. (filter list with MUST_PASS_ONE choose the mininal
step among filters in filter list. Let's call it: The Mininal Step Rule).
I opened another issue HBASE-18160 for above problems, and let's fix this issue
first.
> FilterList with MUST_PASS_ONE lead to redundancy cells returned
> ---------------------------------------------------------------
>
> Key: HBASE-17678
> URL: https://issues.apache.org/jira/browse/HBASE-17678
> Project: HBase
> Issue Type: Bug
> Components: Filters
> Affects Versions: 2.0.0, 1.3.0, 1.2.1
> Environment: RedHat 7.x
> Reporter: Jason Tokayer
> Assignee: Zheng Hu
> Attachments: HBASE-17678.v1.patch, HBASE-17678.v1.rough.patch,
> HBASE-17678.v2.patch, HBASE-17678.v3.patch, HBASE-17678.v4.patch,
> TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList,
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are
> multiple cells with the same timestamp. This is unexpected since there is
> only a single filter in the list, and I would believe that MUST_PASS_ALL and
> MUST_PASS_ONE should only affect the behavior of the joined filter and not
> the behavior of any one of the individual filters. If this is not a bug then
> it would be nice if the documentation is updated to explain this nuanced
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep
> multiple cells with the same timestamp. This is generally fine but presents
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1000000000000
> put 'ns:tbl','row','family:name','Jane',1000000000000
> put 'ns:tbl','row','family:name','Gil',1000000000000
> put 'ns:tbl','row','family:name','Jane',1000000000000
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
> val table = connection.getTable(TableName.valueOf("ns:tbl"))
> val paginationFilter = new ColumnPaginationFilter(limit,offset)
> val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
> println("@ filterList = "+filterList)
> val results = table.get(new
> Get(Bytes.toBytes("row")).setFilter(filterList))
> val cells = results.rawCells()
> if (cells != null) {
> for (cell <- cells) {
> val value = new String(CellUtil.cloneValue(cell))
> val qualifier = new String(CellUtil.cloneQualifier(cell))
> val family = new String(CellUtil.cloneFamily(cell))
> val result = "OFFSET = "+offset+":"+family + "," + qualifier
> + "," + value + "," + cell.getTimestamp()
> resultsList.append(result)
> }
> }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> OFFSET = 1:family,name,Gil,1000000000000
> OFFSET = 2:family,name,Jane,1000000000000
> OFFSET = 3:family,name,John,1000000000000
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> OFFSET = 2:family,name,Jane,1000000000000
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but
> MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a
> single (not-duplicated) within a page, but not across pages.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)