spapin opened a new pull request, #18587: URL: https://github.com/apache/pinot/pull/18587
## Summary Fixes https://github.com/apache/pinot/issues/10600 The `PredicateComparisonRewriter` previously rewrote column-to-column comparisons like `WHERE a = b` into `minus(a, b) = 0`. This arithmetic rewrite forced DOUBLE coercion on both operands, which: - **Broke string comparisons entirely** — `Double.parseDouble("some string")` throws `NumberFormatException` - **Lost precision for integer types** — large LONG values cast to DOUBLE can alias, making distinct values appear equal - **Affected both v1 and v2 engines** — the v2 leaf stage applies the same rewriter, so even multi-stage queries hit this bug The fix replaces the `minus()` rewrite with the type-safe comparison transform functions (`equals`, `not_equals`, `greater_than`, etc.) that already exist in `BinaryOperatorTransformFunction`. These dispatch on the actual stored types of both operands — `Integer.compare` for INT, `Long.compare` for LONG, `String.compareTo` for STRING — with no forced coercion. ### Why the rewrite is still needed The v1 filter engine requires a **literal on the RHS** of predicates. The rewrite moves the column-to-column comparison into a transform function, leaving a literal `true` on the RHS: ``` Before (broken): EQUALS(json_extract_scalar(...), other_col) ← column on RHS, v1 can't evaluate Old rewrite: EQUALS(minus(json_extract_scalar(...), other_col), 0) ← forced DOUBLE coercion, breaks strings New rewrite: EQUALS(equals(json_extract_scalar(...), other_col), true) ← type-safe per-row comparison ``` ### Related issues - https://github.com/apache/pinot/issues/10600: `WHERE string_col = string_col` throws `NumberFormatException` (open since Apr 2023) ### Changes - **`PredicateComparisonRewriter.java`** — `a op b` → `EQUALS(comparison_func(a, b), true)` instead of `EQUALS(minus(a, b), 0)` - **`PredicateComparisonRewriterTest.java`** — covers all 6 comparison operators, function-on-LHS, operand preservation, and BETWEEN rejection - **`JsonExtractScalarTest.java`** — end-to-end integration tests for string and numeric column-to-column comparisons via `json_extract_scalar` ## Test plan - [x] `PredicateComparisonRewriterTest` — verifies rewrite structure for `=`, `!=`, `<`, `>`, `>=` and function-on-LHS - [x] `JsonExtractScalarTest#testJsonExtractScalarComparedToColumn` — end-to-end across all 7 JSON column variants: - String `=` (no match → empty result, no error) - String `=` same expression both sides (all rows match) - String `!=` (all rows satisfy) - Numeric `=` (no match → empty result, no error) - Numeric `>` (all rows satisfy) - [x] Local quickstart: verified `json_extract_scalar(properties, '$.hs_name', 'STRING', '') = expectedName` returns correct matching rows -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
