Kontinuation opened a new issue, #22662: URL: https://github.com/apache/datafusion/issues/22662
### Describe the bug `AsyncFuncExpr` rebuilds the output `Field` for async scalar UDFs from only the field name, data type, and nullability. This drops any metadata attached by the UDF's `return_field_from_args(...)`. This causes async scalar UDF result batches to lose extension metadata that is present during planning. I found this while implementing an async UDF version of `RS_FromPath` in `apache/sedona-db` for loading raster data using GDAL: - https://github.com/apache/sedona-db/pull/831 In that case, the async UDF returned a field representing raster data with extension metadata, but the collected result batches lost that metadata, which broke downstream logic that depended on the logical type. ### To Reproduce A minimal repro is an async scalar UDF that: 1. returns a normal `Utf8` value 2. overrides `return_field_from_args(...)` to attach metadata such as: - `ARROW:extension:name = test.async.extension` Then run: ```sql SELECT async_extension(value) AS result FROM test_table ``` and inspect the collected batch schema for `result`. Without a fix, the field metadata is missing from the result batch schema. ### Expected behavior The async UDF result field in collected batches should preserve the metadata computed by `return_field_from_args(...)`. ### Additional context Root cause appears to be `AsyncFuncExpr::field(...)` in `datafusion/physical-expr/src/async_scalar_function.rs`, which reconstructs a new `Field` instead of preserving the already planned `return_field`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
