pitrou commented on PR #13442: URL: https://github.com/apache/arrow/pull/13442#issuecomment-1171334430
We don't really. I think the basic design of having a fixed block size was copied from the CSV reader where 1MB was measured to be a good compromise on some actual files. More abstractly, the block size is meant to allow for decent CPU cache locality, for which 16MB feels a bit too large in the general case. In the end, it would be better to have significant macro-benchmarks, both for CSV and for JSON. But that needs someone to spend time adding those benchmarks... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org