Re: [PR] bug: improve schema checking for instert into cases [datafusion]

via GitHub Tue, 11 Feb 2025 04:45:43 -0800


jayzhan211 commented on code in PR #14572:
URL: https://github.com/apache/datafusion/pull/14572#discussion_r1950779945



##########
datafusion/common/src/dfschema.rs:
##########
@@ -1028,20 +1028,48 @@ impl SchemaExt for Schema {
             })
     }
 
-    fn logically_equivalent_names_and_types(&self, other: &Self) -> bool {
+    // There are three cases we need to check
+    // 1. The len of the schema of the plan and the schema of the table should 
be the same
+    // 2. The nullable flag of the schema of the plan and the schema of the 
table should be the same
+    // 3. The datatype of the schema of the plan and the schema of the table 
should be the same
+    fn logically_equivalent_names_and_types(&self, other: &Self) -> Result<(), 
String> {

Review Comment:
   Why not `Result<bool>`



##########
datafusion/common/src/dfschema.rs:
##########
@@ -1028,20 +1028,48 @@ impl SchemaExt for Schema {
             })
     }
 
-    fn logically_equivalent_names_and_types(&self, other: &Self) -> bool {
+    // There are three cases we need to check
+    // 1. The len of the schema of the plan and the schema of the table should 
be the same
+    // 2. The nullable flag of the schema of the plan and the schema of the 
table should be the same
+    // 3. The datatype of the schema of the plan and the schema of the table 
should be the same
+    fn logically_equivalent_names_and_types(&self, other: &Self) -> Result<(), 
String> {
         if self.fields().len() != other.fields().len() {
-            return false;
+            return Err(format!(
+                "Inserting query must have the same schema length as the 
table. \
+            Expected table schema length: {}, got: {}",
+                self.fields().len(),
+                other.fields().len()
+            ));
         }
 
         self.fields()
             .iter()
             .zip(other.fields().iter())
-            .all(|(f1, f2)| {
-                f1.name() == f2.name()
-                    && DFSchema::datatype_is_logically_equal(
+            .try_for_each(|(f1, f2)| {
+                if f1.is_nullable() != f2.is_nullable() {

Review Comment:
   If the field is nullable, we can insert non-null column. Similar to #14519 
   



##########
datafusion/sqllogictest/test_files/insert.slt:
##########
@@ -78,7 +104,7 @@ physical_plan
 query I
 INSERT INTO table_without_values SELECT
 SUM(c4) OVER(PARTITION BY c1 ORDER BY c9 ROWS BETWEEN 1 PRECEDING AND 1 
FOLLOWING),
-COUNT(*) OVER(PARTITION BY c1 ORDER BY c9 ROWS BETWEEN 1 PRECEDING AND 1 
FOLLOWING)
+NULLIF(COUNT(*) OVER(PARTITION BY c1 ORDER BY c9 ROWS BETWEEN 1 PRECEDING AND 
1 FOLLOWING), 0)

Review Comment:
   Why do we need NULLIF? Does its use indicate a potential issue?
   
   
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] bug: improve schema checking for instert into cases [datafusion]

Reply via email to