Shekhar Prasad Rajak created SPARK-56826:
--------------------------------------------
Summary: PushVariantIntoScan throws NPE / NoSuchElementException
when invariants from upstream rules don't hold
Key: SPARK-56826
URL: https://issues.apache.org/jira/browse/SPARK-56826
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.1.1
Reporter: Shekhar Prasad Rajak
Fix For: 4.2.0
org.apache.spark.sql.execution.datasources.PushVariantIntoScan
(RequestedVariantField companion) makes two assumptions about its inputs that
hold under the default optimizer pipeline but are not validated locally:
1. VariantGet.path.eval() is non-null (relied on by path.eval().toString)
2. VariantGet.timeZoneId and Cast.timeZoneId are Some(_) (relied on by .get)
logs :
[P1] threw java.lang.NullPointerException:
Cannot invoke "Object.toString()" because the return value of
"org.apache.spark.sql.catalyst.expressions.Expression.eval(...)" is null
[P2] threw java.util.NoSuchElementException: None.get
[P6] threw java.util.NoSuchElementException: None.get
Expected Behaviour
RequestedVariantField.apply(VariantGet) and RequestedVariantField.apply(Cast)
should either:
• Return a sensible RequestedVariantField by treating missing inputs
defensively (e.g. fall back to SQLConf.get.sessionLocalTimeZone for missing tz;
throw IllegalStateException with a clear message for null path), or
• Be guarded at the call sites in collectRequestedFields / rewriteExpr
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]