paleolimbot opened a new issue, #17425:
URL: https://github.com/apache/datafusion/issues/17425
### Describe the bug
Function calls that return scalars can be used in SQL VALUES; however if
they contain extension metadata the metadata is dropped.
### To Reproduce
Output:
```
Regular select:
Field { name: "extension", data_type: Utf8, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {"ARROW:extension:metadata": "foofy.foofy"} }
VALUES select:
Field { name: "extension", data_type: Utf8, nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }
```
```rust
use std::collections::HashMap;
use datafusion::{
arrow::datatypes::DataType,
logical_expr::{ScalarUDFImpl, Signature, Volatility},
prelude::*,
};
#[tokio::main]
async fn main() {
let ctx = SessionContext::new();
ctx.register_udf(MakeExtension::default().into());
let batches = ctx
.sql("SELECT make_extension('foofy zero') as extension")
.await
.unwrap()
.collect()
.await
.unwrap();
println!("Regular select:");
println!("{:?}", batches[0].schema().field(0));
let batches = ctx
.sql(
"
SELECT extension FROM (VALUES
('one', make_extension('foofy one')),
('two', make_extension('foofy two')),
('three', make_extension('foofy three')))
AS t(string, extension)
",
)
.await
.unwrap()
.collect()
.await
.unwrap();
println!("\nVALUES select:");
println!("{:?}", batches[0].schema().field(0));
}
#[derive(Debug)]
struct MakeExtension {
signature: Signature,
}
impl Default for MakeExtension {
fn default() -> Self {
Self {
signature: Signature::user_defined(Volatility::Immutable),
}
}
}
impl ScalarUDFImpl for MakeExtension {
fn as_any(&self) -> &dyn std::any::Any {
self
}
fn name(&self) -> &str {
"make_extension"
}
fn signature(&self) -> &Signature {
&self.signature
}
fn coerce_types(&self, arg_types: &[DataType]) ->
datafusion::error::Result<Vec<DataType>> {
Ok(arg_types.to_vec())
}
fn return_type(&self, _arg_types: &[DataType]) ->
datafusion::error::Result<DataType> {
unreachable!("This shouldn't have been called")
}
fn return_field_from_args(
&self,
args: datafusion::logical_expr::ReturnFieldArgs,
) -> datafusion::error::Result<datafusion::arrow::datatypes::FieldRef> {
Ok(args.arg_fields[0]
.as_ref()
.clone()
.with_metadata(HashMap::from([(
"ARROW:extension:metadata".to_string(),
"foofy.foofy".to_string(),
)]))
.into())
}
fn invoke_with_args(
&self,
args: datafusion::logical_expr::ScalarFunctionArgs,
) -> datafusion::error::Result<datafusion::logical_expr::ColumnarValue> {
Ok(args.args[0].clone())
}
}
```
### Expected behavior
I would have expected the field metadata (if identical for all items) to be
propagated to the schema of the values expression. This does bring the
complexity of type equality, but byte-for-byte hash map equality should be
safe. A "user defined extension type" (if there ever is one) could define a
more lenient equality checker (e.g., JSON object metadata equality for
extension types whose serialization is JSON).
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]