Zhidong Qu created SPARK-57088:
----------------------------------

             Summary: Allow non-deterministic ranking expression for EXACT 
NEAREST BY
                 Key: SPARK-57088
                 URL: https://issues.apache.org/jira/browse/SPARK-57088
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Zhidong Qu


Removes the `NEAREST_BY_JOIN.EXACT_WITH_NONDETERMINISTIC_EXPRESSION` rejection 
in `CheckAnalysis` so the `EXACT` mode of `NEAREST BY JOIN` (added in 
SPARK-56395) accepts non-deterministic ranking expressions, the same way 
`APPROX` already does.

`APPROX` vs. `EXACT` and determinism are orthogonal concerns:
 * `APPROX` vs. `EXACT` is about the search algorithm contract: `APPROX` 
permits the optimizer to use faster approximate strategies (e.g. indexed ANN); 
`EXACT` forces brute-force evaluation.
 * Determinism is a property of the ranking expression itself. Ordinary joins, 
for example, accept non-deterministic join conditions without forcing the user 
into an "approximate" join.

`EXACT` describes algebraic semantics ("compute the exact top-K according to 
the user's ranking expression"); it does not promise reproducibility across 
runs when the ranking expression is itself non-deterministic. Coupling the two 
was an over-restriction that this PR removes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to