kskalski opened a new issue, #4814:
URL: https://github.com/apache/arrow-rs/issues/4814

   **Describe the bug**
   Fields with only null values are skipped when inferring `Schema`, which 
makes reader in strict mode fail as it stumbles upon field which is not 
included in the schema. In any case, silently removing the fields that are in 
the input seems wrong - maybe this should be controlled by an option to 
inference function or it should be left up to the user to filter out null 
fields.
   
   **To Reproduce**
   ```rust
   #[cfg(test)]
   mod tests {
       const DATA2: &str = r#"{"a": 1, "b": "str", "c": null}"#;
   
       #[test]
       fn test_json_infers_null_schema() {
           let input_buf = std::io::Cursor::new(DATA2.as_bytes());
           let mut buf_reader = std::io::BufReader::new(input_buf);
           let schema = 
arrow::json::reader::infer_json_schema_from_seekable(&mut buf_reader, 
None).unwrap();
           let field = schema
               .field_with_name("a")
               .expect("should contain numeric field");
           assert_eq!(&arrow::datatypes::DataType::Int64, field.data_type());
           let field = schema
               .field_with_name("c")
               .expect("should contain null field");
           assert_eq!(&arrow::datatypes::DataType::Null, field.data_type());
       }
   }
   ```
   produces 
   ```
   thread parquet::tests::test_json_infers_null_schema panicked at 
lakeshore-history/src/parquet.rs:197:14:
   should contain null field: SchemaError("Unable to get field named \"c\". 
Valid fields: [\"a\", \"b\"]")
   stack backtrace:
   ```
   
   **Expected behavior**
   test passes
   
   **Additional context**
   tested with arrow 46


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to