kskalski opened a new issue, #4814:
URL: https://github.com/apache/arrow-rs/issues/4814
**Describe the bug**
Fields with only null values are skipped when inferring `Schema`, which
makes reader in strict mode fail as it stumbles upon field which is not
included in the schema. In any case, silently removing the fields that are in
the input seems wrong - maybe this should be controlled by an option to
inference function or it should be left up to the user to filter out null
fields.
**To Reproduce**
```rust
#[cfg(test)]
mod tests {
const DATA2: &str = r#"{"a": 1, "b": "str", "c": null}"#;
#[test]
fn test_json_infers_null_schema() {
let input_buf = std::io::Cursor::new(DATA2.as_bytes());
let mut buf_reader = std::io::BufReader::new(input_buf);
let schema =
arrow::json::reader::infer_json_schema_from_seekable(&mut buf_reader,
None).unwrap();
let field = schema
.field_with_name("a")
.expect("should contain numeric field");
assert_eq!(&arrow::datatypes::DataType::Int64, field.data_type());
let field = schema
.field_with_name("c")
.expect("should contain null field");
assert_eq!(&arrow::datatypes::DataType::Null, field.data_type());
}
}
```
produces
```
thread parquet::tests::test_json_infers_null_schema panicked at
lakeshore-history/src/parquet.rs:197:14:
should contain null field: SchemaError("Unable to get field named \"c\".
Valid fields: [\"a\", \"b\"]")
stack backtrace:
```
**Expected behavior**
test passes
**Additional context**
tested with arrow 46
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]