[
https://issues.apache.org/jira/browse/SPARK-48473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Carmen Kwan updated SPARK-48473:
--------------------------------
Component/s: SQL
(was: Spark Core)
> Add extensible trait to allow-list non-deterministic expressions in operators
> in CheckAnalysis
> ----------------------------------------------------------------------------------------------
>
> Key: SPARK-48473
> URL: https://issues.apache.org/jira/browse/SPARK-48473
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.0.0, 3.5.2
> Reporter: Carmen Kwan
> Priority: Major
>
> CheckAnalysis throws an `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception
> when there is a non-deterministic expression within an operator that is not
> allow listed in the case match check
> [below|https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L773-L784]:
>
> {code:java}
> case o if o.expressions.exists(!_.deterministic) &&
> !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
> !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] &&
> !o.isInstanceOf[Expand] &&
> !o.isInstanceOf[Generate] &&
> // Lateral join is checked in checkSubqueryExpression.
> !o.isInstanceOf[LateralJoin] =>
> // The rule above is used to check Aggregate operator.
> o.failAnalysis(
> errorClass = "INVALID_NON_DETERMINISTIC_EXPRESSIONS",
> messageParameters = Map("sqlExprs" ->
> o.expressions.map(toSQLExpr(_)).mkString(", "))
> ){code}
>
> It would be nice to add a generic trait/class to this case match that is
> allow listed so that when new non-deterministic expressions that live in
> other repositories needs to be allow listed, we don't need to wait for a new
> spark release. For example, in Delta Lake, we want to allow list a specific
> non-deterministic expression for the DeltaMergeIntoMatchedUpdateClause
> operator as part of Delta's [Identity Column
> implementation.|https://github.com/delta-io/delta/issues/1959]It is cleaner
> overall to add an abstract generic class there than to put Delta specific
> logic into this CheckAnalysis rule.
> It would be beneficial to backport this to Spark 3.5 so that we don't need to
> wait for the Spark 4 to benefit from this low risk change.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]