[GitHub] [arrow] pitrou commented on pull request #13442: ARROW-9612: [C++] increase default block_size from 1MB to 16MB

GitBox Thu, 30 Jun 2022 08:04:01 -0700


pitrou commented on PR #13442:
URL: https://github.com/apache/arrow/pull/13442#issuecomment-1171334430


   We don't really. I think the basic design of having a fixed block size was 
copied from the CSV reader where 1MB was measured to be a good compromise on 
some actual files. More abstractly, the block size is meant to allow for decent 
CPU cache locality, for which 16MB feels a bit too large in the general case.
   
   In the end, it would be better to have significant macro-benchmarks, both 
for CSV and for JSON. But that needs someone to spend time adding those 
benchmarks...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] pitrou commented on pull request #13442: ARROW-9612: [C++] increase default block_size from 1MB to 16MB

Reply via email to