Hmm, I don't think we can avoid a strided memory read in any case. We are converting from a row-wise layout (CSV) to a column-wise layout (Arrow). Doing the strided memory read here is not worse than doing it somewhere else.
The varbinary conversion is another thing. I agree that if we get a CSV that is mostly binary / string columns, we might make parsing a bit faster still by avoiding an extra copy here. My intuition is that most useful CSV files (in the Arrow context) will have numeric columns whose parsing performance is slow enough that micro-optimizing binary columns isn't really important. [ Full content available at: https://github.com/apache/arrow/pull/2576 ] This message was relayed via gitbox.apache.org for [email protected]
