XIANG,CHAO-ZHU created PHOENIX-6808: ---------------------------------------
Summary: query filter cannot get correct result Key: PHOENIX-6808 URL: https://issues.apache.org/jira/browse/PHOENIX-6808 Project: Phoenix Issue Type: Bug Components: 4.x, core Affects Versions: 4.14.1 Environment: hbase-1.4 phoenix-4.14.1-HBase-1.4 Reporter: XIANG,CHAO-ZHU Attachments: image-2022-10-10-18-41-09-059.png, image-2022-10-10-22-41-51-879.png, image-2022-10-10-23-10-01-059.png create table && upsert values {code:java} CREATE TABLE IF NOT EXISTS "TEST_TABLE" ( "K" VARCHAR PRIMARY KEY, "f"."JR1" UNSIGNED_INT DEFAULT 0, "f"."JR2" UNSIGNED_INT DEFAULT 0, "f"."ZN" UNSIGNED_INT DEFAULT 0, "f"."AA" UNSIGNED_INT, "f"."AB" UNSIGNED_INT DEFAULT 0, "f"."AC" INTEGER DEFAULT 0, "f"."AD" INTEGER ) column_encoded_bytes=0 SPLIT ON (0, 1, 2, 3, 4, 5, 6, 7, 8, 9); UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD) VALUES('5:xiang:346358075:370562729', 270, null, 10846, null, null, null, null); UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD) VALUES('0:chaozhu:123123120:321321321' , 123, null, 10846, 23303, 123, 321, 111); {code} then execute query: {code:java} select * from "TEST_TABLE" where ( "ZN" in (10846) and ("JR1" in (2303) or "JR2" in (10846) ));{code} result: !image-2022-10-10-18-41-09-059.png! Above, neither `JR1` nor `JR2` matches query statement but result contains 2 rows. And execute query (change the order): {code:java} select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and "ZN" in (10846) ); {code} result: !image-2022-10-10-22-24-52-310.png! Or execute query ( "ZN" in (10846) -> "AB" in ( 123 )): {code:java} select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or "JR2" in (10846) )); {code} result: !image-2022-10-10-23-10-01-059.png! I try to debug it by docker and remote debug. Maybe I know the cause of this problem. In {*}ComparisonExpression.java{*}(org/apache/phoenix/expression/ComparisonExpression.java) {code:java} @Override public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) { if (!children.get(0).evaluate(tuple, ptr)) { return false; } if (ptr.getLength() == 0) { // null comparison evals to null return true; } .... }{code} because `JR2` value is null, so `ptr.getLength() == 0` is true then return true directly. In *AndOrExpression.java* (org/apache/phoenix/expression/AndOrExpression.java) {code:java} public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) { boolean isNull = false; for (int i = 0; i < children.size(); i++) { Expression child = children.get(i); // If partial state is available, then use that to know we've already evaluated this // child expression and do not need to do so again. if (partialEvalState == null || !partialEvalState.get(i)) { // Call through to child evaluate method matching parent call to allow child to optimize // evaluate versus getValue code path. if (child.evaluate(tuple, ptr)) { // Short circuit if we see our stop value if (isStopValue((Boolean) PBoolean.INSTANCE.toObject(ptr, child.getDataType()))) { return true; } else if (partialEvalState != null) { partialEvalState.set(i); } } else { isNull = true; } } } if (isNull) { return false; } return true; } {code} because evaluate `JR2` return true so `("JR1" in (2303) or "JR2" in (10846))` is true. In *BooleanExpressionFilter.Java* (org/apache/phoenix/filter/BooleanExpressionFilter.java) {code:java} @Override public String toString() { return expression.toString(); } @edu.umd.cs.findbugs.annotations.SuppressWarnings( value="NP_BOOLEAN_RETURN_NULL", justification="Returns null by design.") protected Boolean evaluate(Tuple input) { try { if (!expression.evaluate(input, tempPtr)) { return null; } } catch (IllegalDataException e) { return Boolean.FALSE; } return (Boolean)expression.getDataType().toObject(tempPtr); }{code} now `(Boolean)expression.getDataType().toObject(tempPtr) = true` ( `"ZN" in (10846) is true` cause ) ** In *MultiKeyValueComparisonFilter.java* (org/apache/phoenix/filter/MultiKeyValueComparisonFilter.java) {code:java} public ReturnCode filterKeyValue(Cell cell) { ... // We found a new column, so we can re-evaluate // TODO: if we have row key columns in our expression, should // we always evaluate or just wait until the end? this.matchedColumn = this.evaluate(inputTuple); if (this.matchedColumn == null) { if (inputTuple.isImmutable()) { this.matchedColumn = Boolean.FALSE; } else { return ReturnCode.INCLUDE_AND_NEXT_COL; } } ... } public boolean filterRow() { if (this.matchedColumn == null && !inputTuple.isImmutable() && expression.requiresFinalEvaluation()) { inputTuple.setImmutable(); this.matchedColumn = this.evaluate(inputTuple); } return ! (Boolean.TRUE.equals(this.matchedColumn)); } {code} Above, filterKeyValue -> this.matchedColumn = true -> filterRow -> return false -> not filter row Thus, the reason of query: {code:java} select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and "ZN" in (10846) ); {code} `("JR1" in (2303) or "JR2" in (10846)` is true and `isStopValue((Boolean) PBoolean.INSTANCE.toObject(ptr, child.getDataType())) = true` -> `(Boolean)expression.getDataType().toObject(tempPtr) = null` -> this.matchedColumn = false -> filterRow -> return true -> filter row And, the reason of query: {code:java} select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or "JR2" in (10846) )); {code} hbase read cell and filter it by alphabetical order. first read "AB" and `"AB" in (123)` is true then `partialEvalState.set(0)`, then read "JR1" and set subfilter `partialEvalState.set(0)` final read "JR2" -> skip index 0 since partialEvalState.get(0) -> ("JR1" in (2303) or "JR2" in (10846) ) is true and PBoolean.INSTANCE.toObject(ptr, child.getDataType())) = true` -> `(Boolean)expression.getDataType().toObject(tempPtr) = null` -> this.matchedColumn = false -> filterRow -> return true -> filter row -- This message was sent by Atlassian Jira (v8.20.10#820010)