github-actions[bot] commented on code in PR #64080:
URL: https://github.com/apache/doris/pull/64080#discussion_r3355749053


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/types/ArrayType.java:
##########
@@ -57,6 +58,14 @@ public DataType getItemType() {
         return itemType;
     }
 
+    @Override
+    public boolean canSafetyCastTo(DataType target) {
+        if (target instanceof ArrayType) {
+            return itemType.canSafetyCastTo(((ArrayType) target).itemType);
+        }
+        return target instanceof CharacterType;

Review Comment:
   This marks `ARRAY -> STRING/VARCHAR/CHAR` as distinctness-preserving, but 
the BE string form for arrays is not injective. `DataTypeArraySerDe::to_string` 
joins elements with `, ` and nested `DataTypeStringSerDeBase::to_string` only 
wraps string elements in quotes without escaping embedded quotes. For example, 
`array('a', 'b')` and `array('a" , "b')`-style values can serialize to the same 
text once delimiters/quotes are embedded in a string element, while they are 
distinct as arrays. With this predicate, `Project(CAST(array_col AS STRING))` 
can be pushed below `UNION DISTINCT`, so the distinct runs on the stringified 
values and collapses a row that the original plan would keep. Please do not 
treat complex-to-character casts as safe unless the serialization is proven 
injective; the same concern applies to the `MapType` and `StructType` `target 
instanceof CharacterType` branches added in this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to