Dandandan opened a new pull request #9090: URL: https://github.com/apache/arrow/pull/9090
This came up in https://github.com/apache/arrow/pull/9084 when discussing the PR with @jorgecarleitao One of the ideas here is that we could take advantage of the `cast` kernel in Arrow to parse the csv column, deduplicating code and making more use of Arrow. We can get rid of the custom parsers and parsing / building primitive arrays. On loading the CSV in the TCPH benchmark looks like this has a small negative effect on performance. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
