[GitHub] [arrow] wesm commented on pull request #2576: ARROW-25: [C++] Implement CSV reader

GitHub Mon, 01 Oct 2018 04:56:25 -0700

> Doing the strided memory read here is not worse than doing it somewhere else.


What I'm actually saying is not to do strided memory access at all. You can 
build contiguous buffers for each column while you are tokenizing. 

I'll spend some time working on this and flamegraphing / etc. so that I can 
assure myself of the best approach to tokenize-convert. I have been unsatisfied 
for years about the pandas CSV reader but never motivated enough to do anything 
about it because of pandas's underlying internal problems. I'd like to turn 
this CSV reader into something that we can use for the next 20 years (!)

[ Full content available at: https://github.com/apache/arrow/pull/2576 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [arrow] wesm commented on pull request #2576: ARROW-25: [C++] Implement CSV reader

Reply via email to