[ 
https://issues.apache.org/jira/browse/SPARK-48473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carmen Kwan updated SPARK-48473:
--------------------------------
    Fix Version/s: 4.0.0
                   3.5.2

> Add extensible trait to allow-list non-deterministic expressions in operators 
> in CheckAnalysis
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-48473
>                 URL: https://issues.apache.org/jira/browse/SPARK-48473
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.0, 3.5.2
>            Reporter: Carmen Kwan
>            Priority: Major
>             Fix For: 4.0.0, 3.5.2
>
>
> CheckAnalysis throws an `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception 
> when there is a non-deterministic expression within an operator that is not 
> allow listed in the case match check 
> [below|https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L773-L784]:
>  
> {code:java}
>  case o if o.expressions.exists(!_.deterministic) &&
>             !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
>             !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] &&
>             !o.isInstanceOf[Expand] &&
>             !o.isInstanceOf[Generate] &&
>             // Lateral join is checked in checkSubqueryExpression.
>             !o.isInstanceOf[LateralJoin] =>
>             // The rule above is used to check Aggregate operator.
>             o.failAnalysis(
>               errorClass = "INVALID_NON_DETERMINISTIC_EXPRESSIONS",
>               messageParameters = Map("sqlExprs" -> 
> o.expressions.map(toSQLExpr(_)).mkString(", "))
>             ){code}
>  
> It would be nice to add a generic trait/class to this case match that is 
> allow listed so that when new non-deterministic expressions that live in 
> other repositories needs to be allow listed, we don't need to wait for a new 
> spark release. For example, in Delta Lake, we want to allow list a specific 
> non-deterministic expression for the DeltaMergeIntoMatchedUpdateClause 
> operator as part of Delta's [Identity Column 
> implementation.|https://github.com/delta-io/delta/issues/1959]It is cleaner 
> overall to add an abstract generic class there than to put Delta specific 
> logic into this CheckAnalysis rule.  
> It would be beneficial to backport this to Spark 3.5 so that we don't need to 
> wait for the Spark 4 to benefit from this low risk change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to