spapin opened a new pull request, #18587:
URL: https://github.com/apache/pinot/pull/18587

   ## Summary
   
   Fixes https://github.com/apache/pinot/issues/10600
   
   The `PredicateComparisonRewriter` previously rewrote column-to-column 
comparisons like `WHERE a = b` into `minus(a, b) = 0`. This arithmetic rewrite 
forced DOUBLE coercion on both operands, which:
   
   - **Broke string comparisons entirely** — `Double.parseDouble("some 
string")` throws `NumberFormatException`
   - **Lost precision for integer types** — large LONG values cast to DOUBLE 
can alias, making distinct values appear equal
   - **Affected both v1 and v2 engines** — the v2 leaf stage applies the same 
rewriter, so even multi-stage queries hit this bug
   
   The fix replaces the `minus()` rewrite with the type-safe comparison 
transform functions (`equals`, `not_equals`, `greater_than`, etc.) that already 
exist in `BinaryOperatorTransformFunction`. These dispatch on the actual stored 
types of both operands — `Integer.compare` for INT, `Long.compare` for LONG, 
`String.compareTo` for STRING — with no forced coercion.
   
   ### Why the rewrite is still needed
   
   The v1 filter engine requires a **literal on the RHS** of predicates. The 
rewrite moves the column-to-column comparison into a transform function, 
leaving a literal `true` on the RHS:
   
   ```
   Before (broken):  EQUALS(json_extract_scalar(...), other_col)          ← 
column on RHS, v1 can't evaluate
   Old rewrite:      EQUALS(minus(json_extract_scalar(...), other_col), 0) ← 
forced DOUBLE coercion, breaks strings
   New rewrite:      EQUALS(equals(json_extract_scalar(...), other_col), true) 
← type-safe per-row comparison
   ```
   
   ### Related issues
   
   - https://github.com/apache/pinot/issues/10600: `WHERE string_col = 
string_col` throws `NumberFormatException` (open since Apr 2023)
   
   ### Changes
   
   - **`PredicateComparisonRewriter.java`** — `a op b` → 
`EQUALS(comparison_func(a, b), true)` instead of `EQUALS(minus(a, b), 0)`
   - **`PredicateComparisonRewriterTest.java`** — covers all 6 comparison 
operators, function-on-LHS, operand preservation, and BETWEEN rejection
   - **`JsonExtractScalarTest.java`** — end-to-end integration tests for string 
and numeric column-to-column comparisons via `json_extract_scalar`
   
   ## Test plan
   
   - [x] `PredicateComparisonRewriterTest` — verifies rewrite structure for 
`=`, `!=`, `<`, `>`, `>=` and function-on-LHS
   - [x] `JsonExtractScalarTest#testJsonExtractScalarComparedToColumn` — 
end-to-end across all 7 JSON column variants:
     - String `=` (no match → empty result, no error)
     - String `=` same expression both sides (all rows match)
     - String `!=` (all rows satisfy)
     - Numeric `=` (no match → empty result, no error)
     - Numeric `>` (all rows satisfy)
   - [x] Local quickstart: verified `json_extract_scalar(properties, 
'$.hs_name', 'STRING', '') = expectedName` returns correct matching rows


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to