westonpace commented on PR #14269: URL: https://github.com/apache/arrow/pull/14269#issuecomment-1343280843
> However, I also noticed a potential problem with the current generator usage in the CSV reader that needs to be investigated: https://github.com/apache/arrow/issues/14792 I think we're ok here because we aren't actually consuming that generator async-reentrantly. The apply generate can be called re-entrantly (which would be a problem) but it appears this PR is using MakeSerialReadaheadGenerator. I seem to recall we ran into a bug when we tried MakeReadaheadGenerator and I wonder if this was it. So the current implementation will do parallel-I/O, which is nice, and will interleave parsing and decoding. However, it does not do parallel parsing. I think we eventually want this but we will want to find a better way of handling #14792 . Given that the parallel I/O is already giving some nice benefit when reading from S3 perhaps the parallel parse could be left for a follow-up PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
