Hmm, I don't think we can avoid a strided memory read in any case. We are 
converting from a row-wise layout (CSV) to a column-wise layout (Arrow). Doing 
the strided memory read here is not worse than doing it somewhere else.

The varbinary conversion is another thing. I agree that if we get a CSV that is 
mostly binary / string columns, we might make parsing a bit faster still by 
avoiding an extra copy here. My intuition is that most useful CSV files (in the 
Arrow context) will have numeric columns whose parsing performance is slow 
enough that micro-optimizing binary columns isn't really important.

[ Full content available at: https://github.com/apache/arrow/pull/2576 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to