Yohei Kishimoto created HBASE-26863:
---------------------------------------

             Summary: Rowkey pushdown does not work with complex conditions
                 Key: HBASE-26863
                 URL: https://issues.apache.org/jira/browse/HBASE-26863
             Project: HBase
          Issue Type: Bug
          Components: hbase-connectors
    Affects Versions: connector-1.0.0
            Reporter: Yohei Kishimoto


When using pushdown column filter feature of hbase-spark-connector, issuing 
complex query containing rowkey conditions does not get expected rowkey 
pushdown.
{code:java}
{
  "table":{"namespace":"default", "name":"t1"},
  "rowkey":"key",
  "columns":{
    "KEY_FIELD":{"cf":"rowkey", "col":"key", "type":"string"},
    "A_FIELD":{"cf":"c", "col":"a", "type":"string"},
    "B_FIELD":{"cf":"c", "col":"b", "type":"string"}
  }
}
{code}
For example, given the catalog, a query `spark.sql("SELECT * FROM table WHERE 
KEY_FIELD >= 'get1' AND KEY_FIELD <= 'get3' AND A_FIELD IS NOT NULL")` gets 
incomplete rowkey pushdown 
(ScanRange:(upperBound:get3,isUpperBoundEqualTo:true,lowerBound:,isLowerBoundEqualTo:true)).

If a query is `spark.sql("SELECT * FROM table WHERE KEY_FIELD >= 'get1' AND 
KEY_FIELD <= 'get3'")`, we get normal rowkey pushdown 
(ScanRange:(upperBound:get3,isUpperBoundEqualTo:true,lowerBound:,isLowerBoundEqualTo:true)).

I found that ScanRange#getOverlapScanRange and ScanRange#mergeIntersect return 
incorrect results if the range from the arguments is wider than the instance 
(or  scanRange.getOverlapScanRange(scanRange) where scanRange1⊂scanRange2). 
Depending on the order of the Filters that the Spark optimization results 
produce, the order of the scan ranges that these methods receive could be the 
one that causes such a problem.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to