[GitHub] [arrow-rs] jorgecarleitao commented on issue #1059: Allow csv reader builder to do schema inference even when reading csv from stdin

GitBox Sun, 19 Dec 2021 22:14:46 -0800


jorgecarleitao commented on issue #1059:
URL: https://github.com/apache/arrow-rs/issues/1059#issuecomment-997627871



   > I'm a beginner at Rust so please bear with me.
   
   No worries, we are all learning :)
   
   > How do you decide how big the buffer will get?
   
   I think we can't: a single CSV line can be megabytes long. Part of parsing 
CSV is deciding how much bytes are needed per "cell" (row, column).
   
   The code above allocates for as long as the inference requires them (so, 
driven by the number of lines for inference + size of first lines in the 
specific CSV).
   
   > Does your code allow to release the buffer once schema inference has been 
completed? That's how it would be best: read into memory until schema inference 
is done, then release.
   
   Not sure I understood: isn't it the idea that we want to both infer _and_ 
parse those lines into arrow, but we can't seek to repeat the operation? If 
that is the case, I think that we can't release the buffer once the inference 
is done: we can only release it once the data has been infered _and_ parsed 
into columns. The code above works under this hypothesis: once we move the 
position past the buffer size, we release (see `// release memory` comment on 
the code).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] jorgecarleitao commented on issue #1059: Allow csv reader builder to do schema inference even when reading csv from stdin

Reply via email to