viirya opened a new issue, #925:
URL: https://github.com/apache/datafusion-comet/issues/925

   ### Describe the bug
   
   In #924, we found that Spark sometimes produces exchange partitioning where 
the partitioning expression cannot be resolved correctly. 
   
   For example:
   
   ```
   +- TransformWithState value#667.toString, newInstance(class 
org.apache.spark.sql.streaming.InputMapRow), [value#667], [key#659, action#660, 
value#661], org.apache.spark.sql.streaming.TestMapStateProcessor@58fc42f6, 
NoTime, Append, 
   class[value[0]: string], obj#671: scala.Tuple3, state info [ checkpoint = , 
runId = 9af20b3e-feb8-4ccd-a9f0-b3ed1517330a, opId = 0, ver = 0, numPartitions 
= 5], 1725862230745, false, false, [value#667], [key#659, action#660, value#
   661], value#667.toString
      :- Sort [value#667 ASC NULLS FIRST], false, 0
      :  +- Exchange hashpartitioning(value#667, 5), ENSURE_REQUIREMENTS, 
[plan_id=1124]
      :     +- AppendColumns 
org.apache.spark.sql.streaming.TransformWithMapStateSuite$$Lambda$2590/0x000000f801e1c3d0@488fe08d,
 newInstance(class org.apache.spark.sql.streaming.InputMapRow), 
[staticinvoke(class org.apache.spark.unsaf
   e.types.UTF8String, StringType, fromString, input[0, java.lang.String, 
true], true, false, true) AS value#667]
      :        +- LocalTableScan [key#659, action#660, value#661]
      +- !Sort [value#667 ASC NULLS FIRST], false, 0
         +- !Exchange hashpartitioning(value#667, 5), ENSURE_REQUIREMENTS, 
[plan_id=1125]
            +- LocalTableScan <empty>, [value#672]
   ```
   
   It causes resolution error in Comet when Comet tries to translate 
partitioning expressions:
   
   ```
   [info] - transformWithMapState - batch should succeed (without changelog 
checkpointing) *** FAILED *** (23 milliseconds)
   [info]   org.apache.spark.SparkException: [INTERNAL_ERROR] Couldn't find 
value#667 in [value#672] SQLSTATE: XX000
   [info]   at 
org.apache.spark.SparkException$.internalError(SparkException.scala:92)
   [info]   at 
org.apache.spark.SparkException$.internalError(SparkException.scala:96)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:81)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
   [info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:458)
   [info]   at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:84)
   [info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:458)
   [info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:434)
   [info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:402)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74)
   [info]   at 
org.apache.comet.serde.QueryPlanSerde$.exprToProtoInternal$1(QueryPlanSerde.scala:1714)
   [info]   at 
org.apache.comet.serde.QueryPlanSerde$.exprToProto(QueryPlanSerde.scala:2565)
   [info]   at 
org.apache.comet.serde.QueryPlanSerde$.$anonfun$supportPartitioning$1(QueryPlanSerde.scala:3184)
   ```
   
   
   
   ### Steps to reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to