pitrou commented on PR #13442:
URL: https://github.com/apache/arrow/pull/13442#issuecomment-1171334430

   We don't really. I think the basic design of having a fixed block size was 
copied from the CSV reader where 1MB was measured to be a good compromise on 
some actual files. More abstractly, the block size is meant to allow for decent 
CPU cache locality, for which 16MB feels a bit too large in the general case.
   
   In the end, it would be better to have significant macro-benchmarks, both 
for CSV and for JSON. But that needs someone to spend time adding those 
benchmarks...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to