MathiasKindberg opened a new issue, #2722: URL: https://github.com/apache/arrow-rs/issues/2722
**Describe the bug** We have a test that relied on triggering an Arrow error by sending in an empty value to the decoder, truly questionable if that is sensible but that is how it was done to exercise failure cases in our business logic. When upgrading from version 21 or 22 this test stopped working. Debugging it I could track the change to the [`next_batch`](https://docs.rs/arrow/latest/arrow/json/reader/struct.Decoder.html#method.next_batch) function. Digging through recent PRs it seems like PR #2604 has changed to change the behavior when dealing with empty input values. The weird thing now is that when calling num_rows on the produced record batch gives 1, even though no data is inside it which I would consider very undesirable behavior unless Arrow specifies that an empty `RecordBatch` has the length 1? **To Reproduce** ``` fn main() { let maybe_conforming: Vec<Result<serde_json::Value, arrow::error::ArrowError>> = vec![Ok(serde_json::json!({}))]; let schema = std::sync::Arc::new(arrow::datatypes::Schema::new(vec![])); let decoder_options = arrow::json::reader::DecoderOptions::new().with_batch_size(maybe_conforming.len()); let decoder = arrow::json::reader::Decoder::new(schema, decoder_options); // This pr changes it. https://github.com/apache/arrow-rs/pull/2604 let result = decoder.next_batch(&mut maybe_conforming.into_iter()); dbg!(&result); match result { Ok(Some(v)) => { dbg!(v.columns()); } _ => (), } } ``` For version 21 this gives: ``` [src/main.rs:12] &result = Err( InvalidArgumentError( "must either specify a row count or at least one column", ), ) ``` For Version 22 this gives ``` [src/main.rs:12] &result = Ok( Some( RecordBatch { schema: Schema { fields: [], metadata: {}, }, columns: [], row_count: 1, }, ), ) [src/main.rs:15] v.columns() = [] ``` **Expected behavior** I would expect num_rows to return 0 on an empty record_batch. I don't see any issue with empty record batches existing, although the exact behavior of the next_batch function should likely be more thoroughly documented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
