paleolimbot opened a new issue, #17425:
URL: https://github.com/apache/datafusion/issues/17425

   ### Describe the bug
   
   Function calls that return scalars can be used in SQL VALUES; however if 
they contain extension metadata the metadata is dropped.
   
   ### To Reproduce
   
   Output:
   
   ```
   Regular select:
   Field { name: "extension", data_type: Utf8, nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: {"ARROW:extension:metadata": "foofy.foofy"} }
   
   
   VALUES select:
   Field { name: "extension", data_type: Utf8, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }
   ```
   
   ```rust
   use std::collections::HashMap;
   
   use datafusion::{
       arrow::datatypes::DataType,
       logical_expr::{ScalarUDFImpl, Signature, Volatility},
       prelude::*,
   };
   
   #[tokio::main]
   async fn main() {
       let ctx = SessionContext::new();
       ctx.register_udf(MakeExtension::default().into());
   
       let batches = ctx
           .sql("SELECT make_extension('foofy zero') as extension")
           .await
           .unwrap()
           .collect()
           .await
           .unwrap();
       println!("Regular select:");
       println!("{:?}", batches[0].schema().field(0));
   
       let batches = ctx
           .sql(
               "
   SELECT extension FROM (VALUES
       ('one', make_extension('foofy one')),
       ('two', make_extension('foofy two')),
       ('three', make_extension('foofy three')))
   AS t(string, extension)
           ",
           )
           .await
           .unwrap()
           .collect()
           .await
           .unwrap();
   
       println!("\nVALUES select:");
       println!("{:?}", batches[0].schema().field(0));
   }
   
   #[derive(Debug)]
   struct MakeExtension {
       signature: Signature,
   }
   
   impl Default for MakeExtension {
       fn default() -> Self {
           Self {
               signature: Signature::user_defined(Volatility::Immutable),
           }
       }
   }
   
   impl ScalarUDFImpl for MakeExtension {
       fn as_any(&self) -> &dyn std::any::Any {
           self
       }
   
       fn name(&self) -> &str {
           "make_extension"
       }
   
       fn signature(&self) -> &Signature {
           &self.signature
       }
   
       fn coerce_types(&self, arg_types: &[DataType]) -> 
datafusion::error::Result<Vec<DataType>> {
           Ok(arg_types.to_vec())
       }
   
       fn return_type(&self, _arg_types: &[DataType]) -> 
datafusion::error::Result<DataType> {
           unreachable!("This shouldn't have been called")
       }
   
       fn return_field_from_args(
           &self,
           args: datafusion::logical_expr::ReturnFieldArgs,
       ) -> datafusion::error::Result<datafusion::arrow::datatypes::FieldRef> {
           Ok(args.arg_fields[0]
               .as_ref()
               .clone()
               .with_metadata(HashMap::from([(
                   "ARROW:extension:metadata".to_string(),
                   "foofy.foofy".to_string(),
               )]))
               .into())
       }
   
       fn invoke_with_args(
           &self,
           args: datafusion::logical_expr::ScalarFunctionArgs,
       ) -> datafusion::error::Result<datafusion::logical_expr::ColumnarValue> {
           Ok(args.args[0].clone())
       }
   }
   ```
   
   ### Expected behavior
   
   I would have expected the field metadata (if identical for all items) to be 
propagated to the schema of the values expression. This does bring the 
complexity of type equality, but byte-for-byte hash map equality should be 
safe. A "user defined extension type" (if there ever is one) could define a 
more lenient equality checker (e.g., JSON object metadata equality for 
extension types whose serialization is JSON).
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to