[
https://issues.apache.org/jira/browse/SPARK-57088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan reassigned SPARK-57088:
-----------------------------------
Assignee: Zhidong Qu
> Allow non-deterministic ranking expression for EXACT NEAREST BY
> ---------------------------------------------------------------
>
> Key: SPARK-57088
> URL: https://issues.apache.org/jira/browse/SPARK-57088
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Zhidong Qu
> Assignee: Zhidong Qu
> Priority: Major
> Labels: pull-request-available
>
> Removes the `NEAREST_BY_JOIN.EXACT_WITH_NONDETERMINISTIC_EXPRESSION`
> rejection in `CheckAnalysis` so the `EXACT` mode of `NEAREST BY JOIN` (added
> in SPARK-56395) accepts non-deterministic ranking expressions, the same way
> `APPROX` already does.
> `APPROX` vs. `EXACT` and determinism are orthogonal concerns:
> * `APPROX` vs. `EXACT` is about the search algorithm contract: `APPROX`
> permits the optimizer to use faster approximate strategies (e.g. indexed
> ANN); `EXACT` forces brute-force evaluation.
> * Determinism is a property of the ranking expression itself. Ordinary
> joins, for example, accept non-deterministic join conditions without forcing
> the user into an "approximate" join.
> `EXACT` describes algebraic semantics ("compute the exact top-K according to
> the user's ranking expression"); it does not promise reproducibility across
> runs when the ranking expression is itself non-deterministic. Coupling the
> two was an over-restriction that this PR removes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]