MathiasKindberg opened a new issue, #2722:
URL: https://github.com/apache/arrow-rs/issues/2722

   **Describe the bug**
   We have a test that relied on triggering an Arrow error by sending in an 
empty value to the decoder, truly questionable if that is sensible but that is 
how it was done to exercise failure cases in our business logic. When upgrading 
from version 21 or 22 this test stopped working.
   
   Debugging it I could track the change to the 
[`next_batch`](https://docs.rs/arrow/latest/arrow/json/reader/struct.Decoder.html#method.next_batch)
 function. Digging through recent PRs it seems like PR #2604 has changed to 
change the behavior when dealing with empty input values.
   
   The weird thing now is that when calling num_rows on the produced record 
batch gives 1, even though no data is inside it which I would consider very 
undesirable behavior unless Arrow specifies that an empty `RecordBatch` has the 
length 1?
   
   **To Reproduce**
   ```
   fn main() {
       let maybe_conforming: Vec<Result<serde_json::Value, 
arrow::error::ArrowError>> =
           vec![Ok(serde_json::json!({}))];
   
       let schema = std::sync::Arc::new(arrow::datatypes::Schema::new(vec![]));
       let decoder_options =
           
arrow::json::reader::DecoderOptions::new().with_batch_size(maybe_conforming.len());
       let decoder = arrow::json::reader::Decoder::new(schema, decoder_options);
   
       // This pr changes it. https://github.com/apache/arrow-rs/pull/2604
       let result = decoder.next_batch(&mut maybe_conforming.into_iter());
       dbg!(&result);
       match result {
           Ok(Some(v)) => {
               dbg!(v.columns());
           }
           _ => (),
       }
   }
   ```
   
   For version 21 this gives:
   ```
   [src/main.rs:12] &result = Err(
       InvalidArgumentError(
           "must either specify a row count or at least one column",
       ),
   )
   ```
   
   For Version 22 this gives
   
   ```
   [src/main.rs:12] &result = Ok(
       Some(
           RecordBatch {
               schema: Schema {
                   fields: [],
                   metadata: {},
               },
               columns: [],
               row_count: 1,
           },
       ),
   )
   [src/main.rs:15] v.columns() = []
   ```
   
   **Expected behavior**
   I would expect num_rows to return 0 on an empty record_batch. I don't see 
any issue with empty record batches existing, although the exact behavior of 
the next_batch function should likely be more thoroughly documented.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to