pitrou commented on PR #13442: URL: https://github.com/apache/arrow/pull/13442#issuecomment-1171321955
> If we don't have any real world benchmarks that can help estimate scaling characteristics on large files, then we should be able to tell whether the 16MB buffer size affects the micro benchmarks. Right, but that wouldn't tell us much otherwise. > In the case where a file has rows larger than 1MB, it's required to set the block size anyway, so I'm not following what the counter argument to this PR regarding the performance of large files A large file does not necessary have large rows. Large CSV files can typically have very small rows, and I expect at least _some_ large JSON files to be similar. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
