tustvold commented on code in PR #3677:
URL: https://github.com/apache/arrow-rs/pull/3677#discussion_r1102733671
##########
arrow-csv/src/reader/mod.rs:
##########
@@ -2269,4 +2274,73 @@ mod tests {
"Csv error: Encountered invalid UTF-8 data for line 1 and field 1",
);
}
+
Review Comment:
> namely that data that has been read can be decoded and read out as record
batches prior to sending the end of stream.
Because that isn't the issue. The _problem_ was that it would try to fill
the buffer again, even if it had already read the batch_size number of rows.
Without the change in this PR you have
```
fill_sizes: [23, 3, 3, 0, 0]
```
In the case of a streaming decoder, this could potentially cause it to wait
for more input when it doesn't actually need any more input as it already has
the requisite number of rows
> I wonder can we write a test like
This has never been supported, and realistically can't be supported as
`BufRead::fill_buf` will only return an empty slice on EOS, otherwise it will
block. There is no `fill_buf` if data available that I am aware of.
Edit: it would appear there is an experimental API -
https://github.com/rust-lang/rust/issues/86423
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]