Jefffrey commented on issue #17964: URL: https://github.com/apache/datafusion/issues/17964#issuecomment-3479215281
> For `rint -> round`, from what I understand, `rint` uses float [round_ties_even](https://doc.rust-lang.org/std/primitive.f32.html#method.round_ties_even) while `round` uses float [round](https://doc.rust-lang.org/std/primitive.f32.html#method.round), which would produce difference results for floats half-way between integers. Is that intended, or what was the reason? This is indeed the intention of the ticket; to find out these differences and ensure they are either eliminated if there is no reason for them to be different, or to document these differences so in future we know that they are meant to be different. In this case it's worth going through the commit/PR history to see if there are reasons given, otherwise falling back to testing against Spark to confirm if these are meant to be different. > I did not seem to find unit tests in `datafusion/functions-nested/src/make_array.rs` or `datafusion/spark/src/functions/spark_array.rs`. Are tests kept in other files? We prefer keeping our tests as sqllogictests, see: https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest > For `coalesce_types`, `make_array` uses `type_union_resolution` while `spark_array` uses `comparison_coersion` (with folding), both of which are implemented in `datafusion_expr_common::type_coercion::binary`. It seems to me that they are very similar for the purpose of coalescing types and `make_array` can also use `comparison_coersion`, but wanted to check in case I misunderstood anything. Also are there unit tests for `type_union_resolution`? did not seem to find under `type_coercion::binary` module. Unfortunately I'm not sure of this myself. > `make_array` uses field name `LIST_FIELD_DEFAULT_NAME` (which is "item") while `spark_array` uses field name `ARRAY_FIELD_DEFAULT_NAME` (which is "element"). Are they intended to be different? I'm inclined to believe they are meant to be different, but some investigation & accompanying documentation would be very helpful for us in the future to confirm the fact. > `make_array` does not handle case for `DataType::LargeList` while `spark_array` does. Is there a reason for that? I actually think you may have spotted a bug here 🤔 The `spark_array` implementation does seem to suggest it _may_ return a `LargeList` for certain inputs, but this conflicts with the stated return type which should always return a `List` (not a `LargeList`): https://github.com/apache/datafusion/blob/9238779f45418e07108079aeef3b51de85e9fb8f/datafusion/spark/src/function/array/spark_array.rs#L75-L108 - Side note: it's also confusing how both `return_type` and `return_field_from_args` are implemented since usually it's one or the other, so could do with some refactoring here Might be worth mocking an example test which could exhibit this bug (where it returns a `LargeList` in spite of it stating its return type as regular `List`) See the tests for the spark version of array function here: https://github.com/apache/datafusion/blob/9238779f45418e07108079aeef3b51de85e9fb8f/datafusion/sqllogictest/test_files/spark/array/array.slt And use this as reference for how we can construct a `LargeList` via SLT tests: https://github.com/apache/datafusion/blob/9238779f45418e07108079aeef3b51de85e9fb8f/datafusion/sqllogictest/test_files/array.slt#L7946-L7949 @jizezhang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
