[ 
https://issues.apache.org/jira/browse/SPARK-57088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-57088:
-----------------------------------
    Labels: pull-request-available  (was: )

> Allow non-deterministic ranking expression for EXACT NEAREST BY
> ---------------------------------------------------------------
>
>                 Key: SPARK-57088
>                 URL: https://issues.apache.org/jira/browse/SPARK-57088
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Zhidong Qu
>            Priority: Major
>              Labels: pull-request-available
>
> Removes the `NEAREST_BY_JOIN.EXACT_WITH_NONDETERMINISTIC_EXPRESSION` 
> rejection in `CheckAnalysis` so the `EXACT` mode of `NEAREST BY JOIN` (added 
> in SPARK-56395) accepts non-deterministic ranking expressions, the same way 
> `APPROX` already does.
> `APPROX` vs. `EXACT` and determinism are orthogonal concerns:
>  * `APPROX` vs. `EXACT` is about the search algorithm contract: `APPROX` 
> permits the optimizer to use faster approximate strategies (e.g. indexed 
> ANN); `EXACT` forces brute-force evaluation.
>  * Determinism is a property of the ranking expression itself. Ordinary 
> joins, for example, accept non-deterministic join conditions without forcing 
> the user into an "approximate" join.
> `EXACT` describes algebraic semantics ("compute the exact top-K according to 
> the user's ranking expression"); it does not promise reproducibility across 
> runs when the ranking expression is itself non-deterministic. Coupling the 
> two was an over-restriction that this PR removes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to