[GitHub] [arrow] n3world commented on pull request #10794: ARROW-13441: [C++][CSV] Skip empty batches in column decoder

GitBox Mon, 26 Jul 2021 08:24:13 -0700


n3world commented on pull request #10794:
URL: https://github.com/apache/arrow/pull/10794#issuecomment-886799871



   > > Yes that will because the type is unknown, yet. This test seems 
artificial in that it doesn't follow how the column decoder is actually used. 
In use all empty record batches get discarded so their type don't actually 
matter.
   > 
   > Ah, I see. Can you point me where this happens exactly, though?
   
   For the first block it happens at 
`StreamingReaderImpl::InitAfterFirstBatch:920-925`. Looking again at this the 
schema is already captured so that is an issue.
   
   I don't seem to be able to find where it happens other than the first block. 
I know it use to be in the previous streaming reader, 
SerialStreamingReader::ReadNextSkippingEmpty. @westonpace was this an 
intentional change or an accident? Either way this does need a little more work 
in the csv reader to be able to handle consuming the empty leading blocks and 
capturing the schema after the first one with data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] n3world commented on pull request #10794: ARROW-13441: [C++][CSV] Skip empty batches in column decoder

Reply via email to