Zhidong Qu created SPARK-57088:
----------------------------------
Summary: Allow non-deterministic ranking expression for EXACT
NEAREST BY
Key: SPARK-57088
URL: https://issues.apache.org/jira/browse/SPARK-57088
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.2.0
Reporter: Zhidong Qu
Removes the `NEAREST_BY_JOIN.EXACT_WITH_NONDETERMINISTIC_EXPRESSION` rejection
in `CheckAnalysis` so the `EXACT` mode of `NEAREST BY JOIN` (added in
SPARK-56395) accepts non-deterministic ranking expressions, the same way
`APPROX` already does.
`APPROX` vs. `EXACT` and determinism are orthogonal concerns:
* `APPROX` vs. `EXACT` is about the search algorithm contract: `APPROX`
permits the optimizer to use faster approximate strategies (e.g. indexed ANN);
`EXACT` forces brute-force evaluation.
* Determinism is a property of the ranking expression itself. Ordinary joins,
for example, accept non-deterministic join conditions without forcing the user
into an "approximate" join.
`EXACT` describes algebraic semantics ("compute the exact top-K according to
the user's ranking expression"); it does not promise reproducibility across
runs when the ranking expression is itself non-deterministic. Coupling the two
was an over-restriction that this PR removes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]