Yohei Kishimoto created HBASE-26863:
---------------------------------------
Summary: Rowkey pushdown does not work with complex conditions
Key: HBASE-26863
URL: https://issues.apache.org/jira/browse/HBASE-26863
Project: HBase
Issue Type: Bug
Components: hbase-connectors
Affects Versions: connector-1.0.0
Reporter: Yohei Kishimoto
When using pushdown column filter feature of hbase-spark-connector, issuing
complex query containing rowkey conditions does not get expected rowkey
pushdown.
{code:java}
{
"table":{"namespace":"default", "name":"t1"},
"rowkey":"key",
"columns":{
"KEY_FIELD":{"cf":"rowkey", "col":"key", "type":"string"},
"A_FIELD":{"cf":"c", "col":"a", "type":"string"},
"B_FIELD":{"cf":"c", "col":"b", "type":"string"}
}
}
{code}
For example, given the catalog, a query `spark.sql("SELECT * FROM table WHERE
KEY_FIELD >= 'get1' AND KEY_FIELD <= 'get3' AND A_FIELD IS NOT NULL")` gets
incomplete rowkey pushdown
(ScanRange:(upperBound:get3,isUpperBoundEqualTo:true,lowerBound:,isLowerBoundEqualTo:true)).
If a query is `spark.sql("SELECT * FROM table WHERE KEY_FIELD >= 'get1' AND
KEY_FIELD <= 'get3'")`, we get normal rowkey pushdown
(ScanRange:(upperBound:get3,isUpperBoundEqualTo:true,lowerBound:,isLowerBoundEqualTo:true)).
I found that ScanRange#getOverlapScanRange and ScanRange#mergeIntersect return
incorrect results if the range from the arguments is wider than the instance
(or scanRange.getOverlapScanRange(scanRange) where scanRange1⊂scanRange2).
Depending on the order of the Filters that the Spark optimization results
produce, the order of the scan ranges that these methods receive could be the
one that causes such a problem.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)