[GitHub] [arrow] westonpace commented on issue #13142: write_batch vs write_table of ParquetWriter

GitBox Fri, 13 May 2022 02:42:01 -0700


westonpace commented on issue #13142:
URL: https://github.com/apache/arrow/issues/13142#issuecomment-1125855614


   There would be two potential concerns at that point.
   
    1. Parquet has to store metadata for each row group.  This metadata has to 
be read and parsed.  If your row groups are too small you will have poor 
reading performance because a lot of time is spent handling metadata.
    2. It can depend on how many columns you have and if those columns are 
compressed but individual columns might start to get really small (e.g. 1mb) 
and you will end up having to issue a lot of non-contiguous reads to the disk.  
If your disk is a "spinning disk" (e.g. HDD) then this can hurt your I/O 
bandwidth.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on issue #13142: write_batch vs write_table of ParquetWriter

Reply via email to