Carmen Kwan created SPARK-48473:
-----------------------------------

             Summary: Add extensible trait to allow-list non-deterministic 
expressions in operators in CheckAnalysis
                 Key: SPARK-48473
                 URL: https://issues.apache.org/jira/browse/SPARK-48473
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 4.0.0, 3.5.2
            Reporter: Carmen Kwan


CheckAnalysis throws an `INVALID_NON_DETERMINISTIC_EXPRESSIONS` exception when 
there is a non-deterministic expression within an operator that is not allow 
listed in the case match check 
[below|https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L773-L784]:
 
{code:java}
 case o if o.expressions.exists(!_.deterministic) &&
            !o.isInstanceOf[Project] && !o.isInstanceOf[Filter] &&
            !o.isInstanceOf[Aggregate] && !o.isInstanceOf[Window] &&
            !o.isInstanceOf[Expand] &&
            !o.isInstanceOf[Generate] &&
            // Lateral join is checked in checkSubqueryExpression.
            !o.isInstanceOf[LateralJoin] =>
            // The rule above is used to check Aggregate operator.
            o.failAnalysis(
              errorClass = "INVALID_NON_DETERMINISTIC_EXPRESSIONS",
              messageParameters = Map("sqlExprs" -> 
o.expressions.map(toSQLExpr(_)).mkString(", "))
            ){code}
 

It would be nice to add a generic trait/class to this case match that is allow 
listed so that when new non-deterministic expressions that live in other 
repositories needs to be allow listed, we don't need to wait for a new spark 
release. For example, in Delta Lake, we want to allow list a specific 
non-deterministic expression for the DeltaMergeIntoMatchedUpdateClause operator 
as part of Delta's [Identity Column 
implementation.|https://github.com/delta-io/delta/issues/1959]It is cleaner 
overall to add an abstract generic class there than to put Delta specific logic 
into this CheckAnalysis rule.  

It would be beneficial to backport this to Spark 3.5 so that we don't need to 
wait for the Spark 4 to benefit from this low risk change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to