[I] Comet fails to translate partitioning expressions for unresolvable Spark expressions [datafusion-comet]

via GitHub Sun, 08 Sep 2024 23:17:37 -0700


viirya opened a new issue, #925:
URL: https://github.com/apache/datafusion-comet/issues/925


   ### Describe the bug
   
   In #924, we found that Spark sometimes produces exchange partitioning where 
the partitioning expression cannot be resolved correctly. 
   
   For example:
   
   ```
   +- TransformWithState value#667.toString, newInstance(class 
org.apache.spark.sql.streaming.InputMapRow), [value#667], [key#659, action#660, 
value#661], org.apache.spark.sql.streaming.TestMapStateProcessor@58fc42f6, 
NoTime, Append, 
   class[value[0]: string], obj#671: scala.Tuple3, state info [ checkpoint = , 
runId = 9af20b3e-feb8-4ccd-a9f0-b3ed1517330a, opId = 0, ver = 0, numPartitions 
= 5], 1725862230745, false, false, [value#667], [key#659, action#660, value#
   661], value#667.toString
      :- Sort [value#667 ASC NULLS FIRST], false, 0
      :  +- Exchange hashpartitioning(value#667, 5), ENSURE_REQUIREMENTS, 
[plan_id=1124]
      :     +- AppendColumns 
org.apache.spark.sql.streaming.TransformWithMapStateSuite$$Lambda$2590/0x000000f801e1c3d0@488fe08d,
 newInstance(class org.apache.spark.sql.streaming.InputMapRow), 
[staticinvoke(class org.apache.spark.unsaf
   e.types.UTF8String, StringType, fromString, input[0, java.lang.String, 
true], true, false, true) AS value#667]
      :        +- LocalTableScan [key#659, action#660, value#661]
      +- !Sort [value#667 ASC NULLS FIRST], false, 0
         +- !Exchange hashpartitioning(value#667, 5), ENSURE_REQUIREMENTS, 
[plan_id=1125]
            +- LocalTableScan <empty>, [value#672]
   ```
   
   It causes resolution error in Comet when Comet tries to translate 
partitioning expressions:
   
   ```
   [info] - transformWithMapState - batch should succeed (without changelog 
checkpointing) *** FAILED *** (23 milliseconds)
   [info]   org.apache.spark.SparkException: [INTERNAL_ERROR] Couldn't find 
value#667 in [value#672] SQLSTATE: XX000
   [info]   at 
org.apache.spark.SparkException$.internalError(SparkException.scala:92)
   [info]   at 
org.apache.spark.SparkException$.internalError(SparkException.scala:96)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:81)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
   [info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:458)
   [info]   at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:84)
   [info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:458)
   [info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:434)
   [info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:402)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74)
   [info]   at 
org.apache.comet.serde.QueryPlanSerde$.exprToProtoInternal$1(QueryPlanSerde.scala:1714)
   [info]   at 
org.apache.comet.serde.QueryPlanSerde$.exprToProto(QueryPlanSerde.scala:2565)
   [info]   at 
org.apache.comet.serde.QueryPlanSerde$.$anonfun$supportPartitioning$1(QueryPlanSerde.scala:3184)
   ```
   
   
   
   ### Steps to reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Comet fails to translate partitioning expressions for unresolvable Spark expressions [datafusion-comet]

Reply via email to