Shekhar Prasad Rajak created SPARK-56826:
--------------------------------------------

             Summary: PushVariantIntoScan throws NPE / NoSuchElementException 
when invariants from upstream rules don't hold
                 Key: SPARK-56826
                 URL: https://issues.apache.org/jira/browse/SPARK-56826
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.1.1
            Reporter: Shekhar Prasad Rajak
             Fix For: 4.2.0


org.apache.spark.sql.execution.datasources.PushVariantIntoScan 
(RequestedVariantField companion) makes two assumptions about its inputs that 
hold under the default optimizer pipeline but are not validated locally:

 1.  VariantGet.path.eval() is non-null (relied on by path.eval().toString)
 2.  VariantGet.timeZoneId and Cast.timeZoneId are Some(_) (relied on by .get)

logs : 

[P1] threw java.lang.NullPointerException:
     Cannot invoke "Object.toString()" because the return value of
     "org.apache.spark.sql.catalyst.expressions.Expression.eval(...)" is null
[P2] threw java.util.NoSuchElementException: None.get
[P6] threw java.util.NoSuchElementException: None.get

Expected Behaviour
RequestedVariantField.apply(VariantGet) and RequestedVariantField.apply(Cast) 
should either:

•  Return a sensible RequestedVariantField by treating missing inputs 
defensively (e.g. fall back to SQLConf.get.sessionLocalTimeZone for missing tz; 
throw IllegalStateException with a clear message for null path), or
•  Be guarded at the call sites in collectRequestedFields / rewriteExpr



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to