goldmedal commented on code in PR #11452:
URL: https://github.com/apache/datafusion/pull/11452#discussion_r1682778258


##########
datafusion/sqllogictest/test_files/map.slt:
##########
@@ -131,17 +131,23 @@ SELECT MAKE_MAP([1,2], ['a', 'b'], [3,4], ['b']);
 ----
 {[1, 2]: [a, b], [3, 4]: [b]}
 
-query error
+query ?
 SELECT MAKE_MAP('POST', 41, 'HEAD', 'ab', 'PATCH', 30);
+----
+{POST: 41, HEAD: ab, PATCH: 30}

Review Comment:
   > I think we need another `make_array` to does not apply coercion. I prefer 
to align the behaviour to other system unless there is a good reason not to.
   
   I did more tests for DuckDB behavior and found something interesting. I 
found they also try to coercion types when building arrays or maps.
   I arranged some notes for the behaviors:
   
   ## How DuckDB build a map
   It seems that they also transform to two lists and call map function. Just 
like my first design, using `make_array`.
   ```sql
   D select map {1:102, 2:20};
   ┌───────────────────────────────────────────────────────────┐
   │ main.map(main.list_value(1, 2), main.list_value(102, 20)) │
   │                   map(integer, integer)                   │
   ├───────────────────────────────────────────────────────────┤
   │ {1=102, 2=20}                                             │
   └───────────────────────────────────────────────────────────┘
   ```
   
   ## How DuckDB and DataFusion coercion array type
   ### DuckDB
   - Array constructed from `INT32 and numeric string`: DuckDB will make it be 
`INTEGER[]`.
   ```sql
   D select array[1,2,'3'];
   ┌────────────────────┐
   │ (ARRAY[1, 2, '3']) │
   │      int32[]       │
   ├────────────────────┤
   │ [1, 2, 3]          │
   └────────────────────┘
   D select typeof(array[1,2,'3']);
   ┌────────────────────────────┐
   │ typeof((ARRAY[1, 2, '3'])) │
   │          varchar           │
   ├────────────────────────────┤
   │ INTEGER[]                  │
   └────────────────────────────┘
   ```
   - Array constructed from `INT32 and non-numeric string`: DuckDB can't 
construct the array.
   ```sql
   D select array[1,2,'a'];
   Conversion Error: Could not convert the string 'a' to INT32
   LINE 1: select array[1,2,'a'];
   ```
   ### DataFusion
   - Array constructed from `INT32 and numeric string`: DataFusion will make it 
be `Uf8 array`.
   ```sql
   > select [1,2,'1'];
   +-----------------------------------------+
   | make_array(Int64(1),Int64(2),Utf8("1")) |
   +-----------------------------------------+
   | [1, 2, 1]                               |
   +-----------------------------------------+
   1 row(s) fetched. 
   Elapsed 0.001 seconds.
     
   > select arrow_typeof([1,2,'1']);
   
+-----------------------------------------------------------------------------------------------------------------+
   | arrow_typeof(make_array(Int64(1),Int64(2),Utf8("1")))                      
                                     |
   
+-----------------------------------------------------------------------------------------------------------------+
   | List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }) |
   
+-----------------------------------------------------------------------------------------------------------------+
   1 row(s) fetched. 
   Elapsed 0.001 seconds.
   ```
   - Array constructed from `INT32 and non-numeric string`: DataFusion will 
make it be `Uf8 array`.
   ```sql
   > select [1,2,'a'];
   +-----------------------------------------+
   | make_array(Int64(1),Int64(2),Utf8("a")) |
   +-----------------------------------------+
   | [1, 2, a]                               |
   +-----------------------------------------+
   1 row(s) fetched. 
   Elapsed 0.001 seconds.
   
   > select arrow_typeof([1,2,'a']);
   
+-----------------------------------------------------------------------------------------------------------------+
   | arrow_typeof(make_array(Int64(1),Int64(2),Utf8("a")))                      
                                     |
   
+-----------------------------------------------------------------------------------------------------------------+
   | List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }) |
   
+-----------------------------------------------------------------------------------------------------------------+
   1 row(s) fetched. 
   Elapsed 0.001 seconds.
   ```
   The behavior of type coercion between INT32 and String is really different. 
   
   ## How DuckDB coercion map type
   - INT32 value and numeric string value: We can find the value `'20'` has 
been converted to `20`.
   ```sql
   D select map {1:10, 2:'20'};
   ┌────────────────────────────────────────────────────────────┐
   │ main.map(main.list_value(1, 2), main.list_value(10, '20')) │
   │                   map(integer, integer)                    │
   ├────────────────────────────────────────────────────────────┤
   │ {1=10, 2=20}                                               │
   └────────────────────────────────────────────────────────────┘
   ```
   - INT32 value and non-numeric string value. (It's what I tried in the first 
time. That's why I thought it shouldn't be allowed)
   ```sql
   D select map {1:10, 2:'abc'};
   Conversion Error: Could not convert string 'abc' to INT32
   LINE 1: select map {1:10, 2:'abc'};
                               ^
   ```
   
   ## Conclusion
   Referring to these behaviors, I think we can just back to using `make_array` 
to implement this. Because the behavior of type coercion is different, Our 
`make_map` can allow `map {1:10, 2:'a'}` but DuckDB can't do it. It makes sense 
for me.
   @jayzhan211  WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to