westonpace commented on issue #33759: URL: https://github.com/apache/arrow/issues/33759#issuecomment-1416354105
> I thought it was likely related as both issues are caused when using ‘to_batches()’ on small data with the difference being I am reading directly from a mounted disk and the OP is reading over the network. If the scanner is the cause as some comments have suggested both our issues would be resolved by a fix. OP's issue has been identified and they have found a workaround (don't store full metadata in each file) and we have identified a long term fix (#33888). That problem and fix do not have anything to do with #33624. In #33624 the total data transferred is larger than the on-disk size of the data. This would not be caused by arrow retaining metadata in RAM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
