goldmedal commented on code in PR #11452: URL: https://github.com/apache/datafusion/pull/11452#discussion_r1682778258
########## datafusion/sqllogictest/test_files/map.slt: ########## @@ -131,17 +131,23 @@ SELECT MAKE_MAP([1,2], ['a', 'b'], [3,4], ['b']); ---- {[1, 2]: [a, b], [3, 4]: [b]} -query error +query ? SELECT MAKE_MAP('POST', 41, 'HEAD', 'ab', 'PATCH', 30); +---- +{POST: 41, HEAD: ab, PATCH: 30} Review Comment: > I think we need another `make_array` to does not apply coercion. I prefer to align the behaviour to other system unless there is a good reason not to. I did more tests for DuckDB behavior and found something interesting. I found they also try to coercion types when building arrays or maps. I arranged some notes for the behaviors: ## How DuckDB build a map It seems that they also transform to two lists and call map function. Just like my first design, using `make_array`. ```sql D select map {1:102, 2:20}; ┌───────────────────────────────────────────────────────────┐ │ main.map(main.list_value(1, 2), main.list_value(102, 20)) │ │ map(integer, integer) │ ├───────────────────────────────────────────────────────────┤ │ {1=102, 2=20} │ └───────────────────────────────────────────────────────────┘ ``` ## How DuckDB and DataFusion coercion array type ### DuckDB - Array constructed from `INT32 and numeric string`: DuckDB will make it be `INTEGER[]`. ```sql D select array[1,2,'3']; ┌────────────────────┐ │ (ARRAY[1, 2, '3']) │ │ int32[] │ ├────────────────────┤ │ [1, 2, 3] │ └────────────────────┘ D select typeof(array[1,2,'3']); ┌────────────────────────────┐ │ typeof((ARRAY[1, 2, '3'])) │ │ varchar │ ├────────────────────────────┤ │ INTEGER[] │ └────────────────────────────┘ ``` - Array constructed from `INT32 and non-numeric string`: DuckDB can't construct the array. ```sql D select array[1,2,'a']; Conversion Error: Could not convert the string 'a' to INT32 LINE 1: select array[1,2,'a']; ``` ### DataFusion - Array constructed from `INT32 and numeric string`: DataFusion will make it be `Uf8 array`. ```sql > select [1,2,'1']; +-----------------------------------------+ | make_array(Int64(1),Int64(2),Utf8("1")) | +-----------------------------------------+ | [1, 2, 1] | +-----------------------------------------+ 1 row(s) fetched. Elapsed 0.001 seconds. > select arrow_typeof([1,2,'1']); +-----------------------------------------------------------------------------------------------------------------+ | arrow_typeof(make_array(Int64(1),Int64(2),Utf8("1"))) | +-----------------------------------------------------------------------------------------------------------------+ | List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) | +-----------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.001 seconds. ``` - Array constructed from `INT32 and non-numeric string`: DataFusion will make it be `Uf8 array`. ```sql > select [1,2,'a']; +-----------------------------------------+ | make_array(Int64(1),Int64(2),Utf8("a")) | +-----------------------------------------+ | [1, 2, a] | +-----------------------------------------+ 1 row(s) fetched. Elapsed 0.001 seconds. > select arrow_typeof([1,2,'a']); +-----------------------------------------------------------------------------------------------------------------+ | arrow_typeof(make_array(Int64(1),Int64(2),Utf8("a"))) | +-----------------------------------------------------------------------------------------------------------------+ | List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) | +-----------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.001 seconds. ``` The behavior of type coercion between INT32 and String is really different. ## How DuckDB coercion map type - INT32 value and numeric string value: We can find the value `'20'` has been converted to `20`. ```sql D select map {1:10, 2:'20'}; ┌────────────────────────────────────────────────────────────┐ │ main.map(main.list_value(1, 2), main.list_value(10, '20')) │ │ map(integer, integer) │ ├────────────────────────────────────────────────────────────┤ │ {1=10, 2=20} │ └────────────────────────────────────────────────────────────┘ ``` - INT32 value and non-numeric string value. (It's what I tried in the first time. That's why I thought it shouldn't be allowed) ```sql D select map {1:10, 2:'abc'}; Conversion Error: Could not convert string 'abc' to INT32 LINE 1: select map {1:10, 2:'abc'}; ^ ``` ## Conclusion Referring to these behaviors, I think we can just back to using `make_array` to implement this. Because the behavior of type coercion is different, Our `make_map` can allow `map {1:10, 2:'a'}` but DuckDB can't do it. It makes sense for me. @jayzhan211 WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org