alamb commented on code in PR #3677:
URL: https://github.com/apache/arrow-rs/pull/3677#discussion_r1102705589


##########
arrow-csv/src/reader/mod.rs:
##########
@@ -438,10 +438,10 @@ impl<R: BufRead> BufReader<R> {
         loop {
             let buf = self.reader.fill_buf()?;
             let decoded = self.decoder.decode(buf)?;
-            if decoded == 0 {
+            self.reader.consume(decoded);
+            if decoded == 0 || self.decoder.capacity() == 0 {

Review Comment:
   this has the effect of potentially creating smaller output batches, right? 
Basically the reader will now yield up any batches it already has buffered. 



##########
arrow-csv/src/reader/mod.rs:
##########
@@ -2269,4 +2274,73 @@ mod tests {
             "Csv error: Encountered invalid UTF-8 data for line 1 and field 1",
         );
     }
+

Review Comment:
   I am sorry if I am mis understanding the rationale for this change, but I 
don't understand how this tests mimics what is described in 
https://github.com/apache/arrow-rs/issues/3674 -- namely that data that has 
been read can be decoded and read out as record batches prior to sending the 
end of stream.
   
   I wonder can we write a test like
   
   1. set the batch size to 5 (bigger than the output data)
   2.send in "foo,bar\nbaz,foo\n"
   3. Read those two  records <-- as I understand this is what can not be done 
today
   4. Feed in "a,b\nc,d" + EOS
   5. read the final two records
   
   I struggle how to map this test to that usecase -- it would be hard for me 
to understand the purpose of this test if I saw it without context. E.g. why is 
it important that there  were two calls to `fill(0)`?



##########
arrow-csv/src/lib.rs:
##########
@@ -17,6 +17,8 @@
 
 //! Transfer data between the Arrow memory format and CSV (comma-separated 
values).
 
+extern crate core;

Review Comment:
   What is the purpose of this line?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to