Carmen Kwan created SPARK-48473:
-----------------------------------
Summary: Add extensible trait to allow-list non-deterministic
expressions in operators in CheckAnalysis
Key: SPARK-48473
URL: https://issues.apache.org/jira/browse/SPARK-48473
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 4.0.0, 3.5.2
Reporter: Carmen Kwan
CheckAnalysis throws an `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception when
there is a non-deterministic expression within an operator that is not allow
listed in the case match check
[below|https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L773-L784]:
{code:java}
case o if o.expressions.exists(!_.deterministic) &&
!o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
!o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] &&
!o.isInstanceOf[Expand] &&
!o.isInstanceOf[Generate] &&
// Lateral join is checked in checkSubqueryExpression.
!o.isInstanceOf[LateralJoin] =>
// The rule above is used to check Aggregate operator.
o.failAnalysis(
errorClass = "INVALID_NON_DETERMINISTIC_EXPRESSIONS",
messageParameters = Map("sqlExprs" ->
o.expressions.map(toSQLExpr(_)).mkString(", "))
){code}
It would be nice to add a generic trait/class to this case match that is allow
listed so that when new non-deterministic expressions that live in other
repositories needs to be allow listed, we don't need to wait for a new spark
release. For example, in Delta Lake, we want to allow list a specific
non-deterministic expression for the DeltaMergeIntoMatchedUpdateClause operator
as part of Delta's [Identity Column
implementation.|https://github.com/delta-io/delta/issues/1959]It is cleaner
overall to add an abstract generic class there than to put Delta specific logic
into this CheckAnalysis rule.
It would be beneficial to backport this to Spark 3.5 so that we don't need to
wait for the Spark 4 to benefit from this low risk change.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]