Re: [I] Invalid call to dataType on unresolved object [incubator-gluten]


JkSelf commented on issue #10103:
URL: 
https://github.com/apache/incubator-gluten/issues/10103#issuecomment-3031305640

@wenfang6 Spark uses the `spark.sql.legacy.allowHashOnMapType` configuration
to control support for the hash map key function. Gluten enables this feature
by enabling the configuration when creating the ColumnarShuffleExchange
https://github.com/apache/incubator-gluten/blob/main/backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxSparkPlanExecApi.scala#L355-L363,
which helps bypass Spark's unresolved checks.

However, your SQL plan contains two adjacent projects as shown below:
```
p1.projectList:
1. hash(features#2, geek_position#0) as hash_partition_key
2. features#2
3. geek_position#0

p2.projectList:
1. features#2
2. geek_position#0
```

Gluten applies the `CollapseProjectExecTransformer `rule to combine these
two projects into a single `collapseProject`
https://github.com/apache/incubator-gluten/blob/main/gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/CollapseProjectExecTransformer.scala#L40-L41.
Unfortunately, the `collapseProject ` remains unresolved because the
configuration is set to false
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L291-L296.
Could you please test by enabling `spark.sql.legacy.allowHashOnMapType `
configuration?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to