[jira] [Commented] (HBASE-17678) FilterList with MUST_PASS_ONE may lead to redundant cells returned

Anoop Sam John (JIRA) Wed, 07 Jun 2017 00:25:58 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040388#comment-16040388
 ]


Anoop Sam John commented on HBASE-17678:
----------------------------------------

bq.Got your point,  currently , the cell is  a ref to a slice in ByteBuff. So 
we'd better to clone the cell into a new cell (only key is needed) and save it 
in prevCellList ? For large row,  the newly cloned cell will take more heap 
memory, but it won't effect the ByteBuff GC.
Yes  larger RK is not a big concern.. Any way it can not be so long and we cant 
do any thing else.  We follow in some places a shipped() call back model to 
keep the pre cell refs of this kind.  U can see in places like SQM etc.  Am not 
sure how good it is to make a Filter which is client exposed, to be a shipper 
observer. May be not good.  So ya we should do the clone of key part and store 
that?  Am just giving some options.. Am not sure any better ways as did not 
closely see the code parts.  

> FilterList with MUST_PASS_ONE may lead to redundant cells returned
> ------------------------------------------------------------------
>
>                 Key: HBASE-17678
>                 URL: https://issues.apache.org/jira/browse/HBASE-17678
>             Project: HBase
>          Issue Type: Bug
>          Components: Filters
>    Affects Versions: 2.0.0, 1.3.0, 1.2.1
>         Environment: RedHat 7.x
>            Reporter: Jason Tokayer
>            Assignee: Zheng Hu
>         Attachments: HBASE-17678.v1.patch, HBASE-17678.v1.rough.patch, 
> HBASE-17678.v2.patch, HBASE-17678.v3.patch, HBASE-17678.v4.patch, 
> HBASE-17678.v4.patch, HBASE-17678.v5.patch, HBASE-17678.v6.patch, 
> HBASE-17678.v7.patch, HBASE-17678.v7.patch, 
> TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, 
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are 
> multiple cells with the same timestamp. This is unexpected since there is 
> only a single filter in the list, and I would believe that MUST_PASS_ALL and 
> MUST_PASS_ONE should only affect the behavior of the joined filter and not 
> the behavior of any one of the individual filters. If this is not a bug then 
> it would be nice if the documentation is updated to explain this nuanced 
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep 
> multiple cells with the same timestamp. This is generally fine but presents 
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1000000000000
> put 'ns:tbl','row','family:name','Jane',1000000000000
> put 'ns:tbl','row','family:name','Gil',1000000000000
> put 'ns:tbl','row','family:name','Jane',1000000000000
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
>       val table = connection.getTable(TableName.valueOf("ns:tbl"))
>       val paginationFilter = new ColumnPaginationFilter(limit,offset)
>       val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
>       println("@ filterList = "+filterList)
>       val results = table.get(new 
> Get(Bytes.toBytes("row")).setFilter(filterList))
>       val cells = results.rawCells()
>       if (cells != null) {
>               for (cell <- cells) {
>                 val value = new String(CellUtil.cloneValue(cell))
>                 val qualifier = new String(CellUtil.cloneQualifier(cell))
>                 val family = new String(CellUtil.cloneFamily(cell))
>                 val result = "OFFSET = "+offset+":"+family + "," + qualifier 
> + "," + value + "," + cell.getTimestamp()
>                 resultsList.append(result)
>               }
>       }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> OFFSET = 1:family,name,Gil,1000000000000
> OFFSET = 2:family,name,Jane,1000000000000
> OFFSET = 3:family,name,John,1000000000000
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1000000000000
> OFFSET = 2:family,name,Jane,1000000000000
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but 
> MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a 
> single (not-duplicated)  within a page, but not across pages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17678) FilterList with MUST_PASS_ONE may lead to redundant cells returned

Reply via email to