paleolimbot commented on code in PR #17986:
URL: https://github.com/apache/datafusion/pull/17986#discussion_r2418376002


##########
datafusion/sql/src/planner.rs:
##########
@@ -587,11 +589,11 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
             })
     }
 
-    pub(crate) fn convert_data_type(&self, sql_type: &SQLDataType) -> 
Result<DataType> {
+    pub(crate) fn convert_data_type(&self, sql_type: &SQLDataType) -> 
Result<FieldRef> {

Review Comment:
   I think this change would also enable supporting UUIDs and other SQL types 
that map to extension types



##########
datafusion/sql/src/expr/mod.rs:
##########
@@ -986,7 +986,10 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
             _ => expr,
         };
 
-        Ok(Expr::Cast(Cast::new(Box::new(expr), dt)))
+        Ok(Expr::Cast(Cast::new(
+            Box::new(expr),
+            dt.data_type().clone(),
+        )))

Review Comment:
   Another place where metadata is dropped that I didn't update (casts)



##########
datafusion/common/src/metadata.rs:
##########
@@ -0,0 +1,250 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::{collections::BTreeMap, sync::Arc};
+
+use arrow::datatypes::Field;
+use hashbrown::HashMap;
+
+/// Literal metadata
+///
+/// Stores metadata associated with a literal expressions
+/// and is designed to be fast to `clone`.
+///
+/// This structure is used to store metadata associated with a literal 
expression, and it
+/// corresponds to the `metadata` field on [`Field`].
+///
+/// # Example: Create [`FieldMetadata`] from a [`Field`]
+/// ```
+/// # use std::collections::HashMap;
+/// # use datafusion_common::metadata::FieldMetadata;
+/// # use arrow::datatypes::{Field, DataType};
+/// # let field = Field::new("c1", DataType::Int32, true)
+/// #  .with_metadata(HashMap::from([("foo".to_string(), "bar".to_string())]));
+/// // Create a new `FieldMetadata` instance from a `Field`
+/// let metadata = FieldMetadata::new_from_field(&field);
+/// // There is also a `From` impl:
+/// let metadata = FieldMetadata::from(&field);
+/// ```
+///
+/// # Example: Update a [`Field`] with [`FieldMetadata`]
+/// ```
+/// # use datafusion_common::metadata::FieldMetadata;
+/// # use arrow::datatypes::{Field, DataType};
+/// # let field = Field::new("c1", DataType::Int32, true);
+/// # let metadata = FieldMetadata::new_from_field(&field);
+/// // Add any metadata from `FieldMetadata` to `Field`
+/// let updated_field = metadata.add_to_field(field);
+/// ```
+///
+#[derive(Clone, PartialEq, Eq, PartialOrd, Hash, Debug)]
+pub struct FieldMetadata {

Review Comment:
   I just moved this from the expr crate so I could us it in ParamValues



##########
datafusion/expr/src/expr.rs:
##########
@@ -1370,13 +1142,21 @@ pub struct Placeholder {
     /// The identifier of the parameter, including the leading `$` (e.g, 
`"$1"` or `"$foo"`)
     pub id: String,
     /// The type the parameter will be filled in with
-    pub data_type: Option<DataType>,
+    pub field: Option<FieldRef>,

Review Comment:
   This is the main change. We can change this to a less severely breaking 
option (e.g., just add `metadata: FieldMetadata` to the struct)...I started 
with the most breaking version to identify its use in as many places as 
possible.



##########
datafusion/sql/src/planner.rs:
##########
@@ -256,7 +256,7 @@ impl IdentNormalizer {
 pub struct PlannerContext {
     /// Data types for numbered parameters ($1, $2, etc), if supplied
     /// in `PREPARE` statement
-    prepare_param_data_types: Arc<Vec<DataType>>,
+    prepare_param_data_types: Arc<Vec<FieldRef>>,

Review Comment:
   I chose to also do SQL here while I was at it...I could probably isolate 
these changes into a different PR



##########
datafusion/core/src/execution/session_state.rs:
##########
@@ -2012,7 +2013,7 @@ impl SimplifyInfo for SessionSimplifyProvider<'_> {
 #[derive(Debug)]
 pub(crate) struct PreparedPlan {
     /// Data types of the parameters
-    pub(crate) data_types: Vec<DataType>,
+    pub(crate) data_types: Vec<FieldRef>,

Review Comment:
   This is another change that had some impact



##########
datafusion/sql/tests/cases/params.rs:
##########
@@ -155,13 +155,13 @@ fn test_prepare_statement_to_plan_no_param() {
     assert_snapshot!(
         plan,
         @r#"
-    Prepare: "my_plan" [Int32]
+    Prepare: "my_plan" [Field { name: "", data_type: Int32, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }]

Review Comment:
   This highlights something else we'll have to solve: how to print types. 
Printing a field is not particularly helpful in this context. (If this change 
is vaguely in the right direction I'll revert the changes in this file and 
implement the Debug or DisplayAs trait or wherever these strings are coming 
from for now).



##########
datafusion/sql/src/statement.rs:
##########
@@ -1264,7 +1266,7 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
                     or_replace,
                     temporary,
                     name,
-                    return_type,
+                    return_type: return_type.map(|f| f.data_type().clone()),

Review Comment:
   One of the places where metadata is dropped that I didn't update 
(`DdlStatement::CreateFunction` args or return type)



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -1494,24 +1494,43 @@ impl LogicalPlan {
     }
 
     /// Walk the logical plan, find any `Placeholder` tokens, and return a map 
of their IDs and DataTypes
+    ///
+    /// Note that this will drop any extension or field metadata attached to 
parameters. Use
+    /// [`LogicalPlan::get_parameter_fields`] to keep extension metadata.
     pub fn get_parameter_types(
         &self,
     ) -> Result<HashMap<String, Option<DataType>>, DataFusionError> {
-        let mut param_types: HashMap<String, Option<DataType>> = 
HashMap::new();
+        let mut parameter_fields = self.get_parameter_fields()?;
+        Ok(parameter_fields
+            .drain()
+            .map(|(name, maybe_field)| {
+                (name, maybe_field.map(|field| field.data_type().clone()))
+            })
+            .collect())
+    }
+
+    /// Walk the logical plan, find any `Placeholder` tokens, and return a map 
of their IDs and FieldRefs
+    pub fn get_parameter_fields(
+        &self,
+    ) -> Result<HashMap<String, Option<FieldRef>>, DataFusionError> {
+        let mut param_types: HashMap<String, Option<FieldRef>> = 
HashMap::new();
 
         self.apply_with_subqueries(|plan| {
             plan.apply_expressions(|expr| {
                 expr.apply(|expr| {
-                    if let Expr::Placeholder(Placeholder { id, data_type }) = 
expr {
+                    if let Expr::Placeholder(Placeholder { id, field }) = expr 
{
                         let prev = param_types.get(id);
-                        match (prev, data_type) {
-                            (Some(Some(prev)), Some(dt)) => {
-                                if prev != dt {
+                        match (prev, field) {
+                            (Some(Some(prev)), Some(field)) => {
+                                // This check is possibly too strict (requires 
nullability and field
+                                // metadata align perfectly, rather than 
compute true type equality
+                                // when field metadata is representing an 
extension type)
+                                if prev != field {

Review Comment:
   This highlights something we'll have to fix: how to compute type equality 
(e.g., is a shredded and unshredded variant the same type?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to