Hello,
I am having some difficulty understanding the results when I apply a
ColumnPaginationFilter within a FilterList. I’m not sure whether this is an
Hbase bug or a gap in my understanding of how the API works.
Specifically, I’m noticing a difference between using MUST_PASS_ONE vs
MUST_PASS_ALL in my filterList even when I only have a single filter in the
list. I walk through a full, but simplified (ie I took out the other filters in
the list because I have narrowed down the problem; but I still do need to use a
filterList), example below that illustrated the issue:
First, in the shell I create a table and insert multiple values with the same
timestamp:
create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
put 'ns:tbl','row','family:name','John',1000000000000
put 'ns:tbl','row','family:name','Jane',1000000000000
put 'ns:tbl','row','family:name','Gil',1000000000000
put 'ns:tbl','row','family:name','Jane',1000000000000
Now, I create a custom client written in Scala that uses the Java APIs:
import org.apache.hadoop.hbase.filter._
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
import scala.collection.mutable._
val config = HBaseConfiguration.create()
config.set("hbase.zookeeper.quorum", "localhost")
config.set("hbase.zookeeper.property.clientPort", "2181")
val connection = ConnectionFactory.createConnection(config)
val logicalOp = FilterList.Operator.MUST_PASS_ALL
val limit = 1
var resultsList = ListBuffer[String]()
for (offset <- 0 to 20 by limit) {
val table = connection.getTable(TableName.valueOf("ns:tbl"))
val paginationFilter = new ColumnPaginationFilter(limit,offset)
val filterList: FilterList = new
FilterList(logicalOp,paginationFilter)
val results = table.get(new
Get(Bytes.toBytes("row")).setFilter(filterList))
val cells = results.rawCells()
if (cells != null) {
for (cell <- cells) {
val value = new String(CellUtil.cloneValue(cell))
val qualifier = new
String(CellUtil.cloneQualifier(cell))
val family = new String(CellUtil.cloneFamily(cell))
val result = "OFFSET = "+offset+":"+family + "," +
qualifier + "," + value + "," + cell.getTimestamp()
println(result)
resultsList.append(result)
}
}
}
My results look like:
limit = 1 & logicalOp = MUST_PASS_ALL:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000
limit = 1 & logicalOp = MUST_PASS_ONE:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000
OFFSET = 1:family,name,Gil,1000000000000
OFFSET = 2:family,name,Jane,1000000000000
OFFSET = 3:family,name,John,1000000000000
limit = 2 & logicalOp = MUST_PASS_ALL:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000
limit = 2 & logicalOp = MUST_PASS_ONE:
scala> resultsList.foreach(println)
OFFSET = 0:family,name,Jane,1000000000000
OFFSET = 2:family,name,Jane,1000000000000
My main question is around why, when using MUST_PASS_ONE, don’t I get back only
the single, most-recently-inserted value of the cell as I do when I use
MUST_PASS_ALL? Note that if I don’t use a filterList at all and instance just
set the get’s filter to the paginationFilter, I get the result I would expect
(ie the single OFFSET = 0:family,name,Jane,1000000000000).
The documentation isn’t entirely clear about this situation, and I’m hoping
someone on either mailing list may be able to assist.
Best,
Jason
________________________________________________________
The information contained in this e-mail is confidential and/or proprietary to
Capital One and/or its affiliates and may only be used solely in performance of
work or services for Capital One. The information transmitted herewith is
intended only for use by the individual or entity to which it is addressed. If
the reader of this message is not the intended recipient, you are hereby
notified that any review, retransmission, dissemination, distribution, copying
or other use of, or taking of any action in reliance upon this information is
strictly prohibited. If you have received this communication in error, please
contact the sender and delete the material from your computer.