albertlockett commented on PR #22647: URL: https://github.com/apache/datafusion/pull/22647#issuecomment-4606772028
> Can we have an SLT case to prove it works end to end (to ensure it works with the signature code too) @Jefffrey if I'm not mistaken, we already have SLT tests for this: https://github.com/apache/datafusion/blob/00c35d0c1e7dc5c1acd5ee7c0b16f1da047faf24/datafusion/sqllogictest/test_files/functions.slt#L513-L516 https://github.com/apache/datafusion/blob/00c35d0c1e7dc5c1acd5ee7c0b16f1da047faf24/datafusion/sqllogictest/test_files/functions.slt#L433-L436 Invoking these functions doesn't fail when we invoke them on dictionary columns using the sql API. For example: ```rust use arrow::{ array::{DictionaryArray, RecordBatch, StringArray, UInt8Array}, util::pretty::print_batches, }; use arrow_schema::{DataType, Field, Schema}; use datafusion::{catalog::MemTable, functions::expr_fn::lower, prelude::SessionContext}; use datafusion::{logical_expr::col, physical_expr::create_physical_expr}; use datafusion_common::DFSchema; use std::sync::Arc; #[tokio::main] async fn main() { let schema = Arc::new(Schema::new(vec![ Field::new("a", DataType::Utf8, false), Field::new( "b", DataType::Dictionary(Box::new(DataType::UInt8), Box::new(DataType::Utf8)), false, ), ])); let rb = RecordBatch::try_new( schema.clone(), vec![ Arc::new(StringArray::from_iter_values(["a", "aa", "b"])), Arc::new(DictionaryArray::new( UInt8Array::from_iter_values([0, 1, 0]), Arc::new(StringArray::from_iter_values(["a", "B"])), )), ], ) .unwrap(); // -- Invoking through SQL API works --- let ctx = SessionContext::new(); ctx.register_table( "tab1", Arc::new(MemTable::try_new(schema.clone(), vec![vec![rb.clone()]]).unwrap()), ) .unwrap(); let df = ctx.sql("select lower(b) from tab1").await.unwrap(); let batches = df.collect().await.unwrap(); print_batches(&batches).unwrap(); // --- Invoking through expr API, does not work --- let physical_expr = create_physical_expr( &lower(col("b")), &DFSchema::try_from(schema).unwrap(), &ctx.state().execution_props(), ) .unwrap(); let expr_result = physical_expr.evaluate(&rb).unwrap(); // panics println!("{:?}", expr_result); } ``` I think this is because when planning, we coerce the field to Utf8View. The `explain` for that SQL query: ``` +---------------+----------------------------------------------------------------------+ | plan_type | plan | +---------------+----------------------------------------------------------------------+ | logical_plan | Projection: lower(CAST(tab1.b AS Utf8View)) | | | TableScan: tab1 projection=[b] | | physical_plan | ProjectionExec: expr=[lower(CAST(b@0 AS Utf8View)) as lower(tab1.b)] | | | DataSourceExec: partitions=1, partition_sizes=[1] | | | | +---------------+----------------------------------------------------------------------+ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
