alamb commented on code in PR #7481:
URL: https://github.com/apache/arrow-rs/pull/7481#discussion_r2080096874


##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -506,37 +525,47 @@ impl ArrowReaderMetadata {
         // parquet_to_arrow_field_levels is expected to throw an error if the 
schemas have
         // different lengths, but we check here to be safe.
         if inferred_len != supplied_len {
-            Err(arrow_err!(format!(
-                "incompatible arrow schema, expected {} columns received {}",
+            return Err(arrow_err!(format!(
+                "Incompatible supplied Arrow schema: expected {} columns 
received {}",
                 inferred_len, supplied_len
-            )))
-        } else {
-            let diff_fields: Vec<_> = supplied_schema
-                .fields()
-                .iter()
-                .zip(fields.iter())
-                .filter_map(|(field1, field2)| {
-                    if field1 != field2 {
-                        Some(field1.name().clone())
-                    } else {
-                        None
-                    }
-                })
-                .collect();
+            )));
+        }
 
-            if !diff_fields.is_empty() {
-                Err(ParquetError::ArrowError(format!(
-                    "incompatible arrow schema, the following fields could not 
be cast: [{}]",
-                    diff_fields.join(", ")
-                )))
-            } else {
-                Ok(Self {
-                    metadata,
-                    schema: supplied_schema,
-                    fields: field_levels.levels.map(Arc::new),
-                })
+        let mut errors = Vec::new();

Review Comment:
   I think relaxing the check means that a user could supply the reader a 
schema that had metadata that was not present in the file and the reader will 
then read RecordBatches that have that metadata
   
   I agree `field_cast` is the longer term right thing to do in DataFusion
   
   In arrow-rs I think that field "casting" is happening during reading of 
parquet



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to