github-actions[bot] commented on code in PR #64080:
URL: https://github.com/apache/doris/pull/64080#discussion_r3355749053
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/types/ArrayType.java:
##########
@@ -57,6 +58,14 @@ public DataType getItemType() {
return itemType;
}
+ @Override
+ public boolean canSafetyCastTo(DataType target) {
+ if (target instanceof ArrayType) {
+ return itemType.canSafetyCastTo(((ArrayType) target).itemType);
+ }
+ return target instanceof CharacterType;
Review Comment:
This marks `ARRAY -> STRING/VARCHAR/CHAR` as distinctness-preserving, but
the BE string form for arrays is not injective. `DataTypeArraySerDe::to_string`
joins elements with `, ` and nested `DataTypeStringSerDeBase::to_string` only
wraps string elements in quotes without escaping embedded quotes. For example,
`array('a', 'b')` and `array('a" , "b')`-style values can serialize to the same
text once delimiters/quotes are embedded in a string element, while they are
distinct as arrays. With this predicate, `Project(CAST(array_col AS STRING))`
can be pushed below `UNION DISTINCT`, so the distinct runs on the stringified
values and collapses a row that the original plan would keep. Please do not
treat complex-to-character casts as safe unless the serialization is proven
injective; the same concern applies to the `MapType` and `StructType` `target
instanceof CharacterType` branches added in this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]