huaxingao commented on code in PR #1830:
URL: https://github.com/apache/datafusion-comet/pull/1830#discussion_r2127844183
##########
common/src/main/java/org/apache/comet/parquet/TypeUtil.java:
##########
@@ -74,7 +74,7 @@ public static ColumnDescriptor convertToParquet(StructField
field) {
builder = Types.primitive(PrimitiveType.PrimitiveTypeName.INT64,
repetition);
} else if (type == DataTypes.BinaryType) {
builder = Types.primitive(PrimitiveType.PrimitiveTypeName.BINARY,
repetition);
- } else if (type == DataTypes.StringType) {
+ } else if (type == DataTypes.StringType ||
type.sameType(DataTypes.StringType)) {
Review Comment:
This is to support String Collation in Spark 4.0. (e.g. [test("Check order
by on table with collated string
column")](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/collation/CollationSuite.scala#L1117)
)
Without String Collation, it goes to
https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala#L89
which uses singleton `DataTypes.StringType`, so type == DataTypes.StringType,
But with String Collation, it goes to
https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala#L94,
so type == DataTypes.StringType fails and I added
type.sameType(DataTypes.StringType to let String Collation pass. Actually, I
think it should be
```
else if (
type == DataTypes.StringType ||
(type.sameType(DataTypes.StringType) && isSpark40Plus())
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]