[GitHub] [arrow] westonpace commented on pull request #10568: ARROW-11889: [C++] Add parallelism to streaming CSV reader

GitBox Thu, 01 Jul 2021 01:27:25 -0700


westonpace commented on pull request #10568:
URL: https://github.com/apache/arrow/pull/10568#issuecomment-872039323



   I've rebased in the changes from #10509.  The behavior is only slightly 
different.  Opening the streaming CSV reader reads in the first record batch so 
the bytes_read will reflect that before any batch is read.  After that each 
time a batch is read in the next batch will be read in.  This means the read 
will not increment bytes_read.  If reading in parallel then bytes_read could 
potentially be even further ahead of the consumer since it will be doing 
decoding in readahead.  It should still match the spirit of the feature which 
is to report how many bytes have been decoded.
   
   @n3world @pitrou review is welcome.  The CI failure is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on pull request #10568: ARROW-11889: [C++] Add parallelism to streaming CSV reader

Reply via email to