westonpace commented on issue #13142:
URL: https://github.com/apache/arrow/issues/13142#issuecomment-1125855614
There would be two potential concerns at that point.
1. Parquet has to store metadata for each row group. This metadata has to
be read and parsed. If your row groups are too small you will have poor
reading performance because a lot of time is spent handling metadata.
2. It can depend on how many columns you have and if those columns are
compressed but individual columns might start to get really small (e.g. 1mb)
and you will end up having to issue a lot of non-contiguous reads to the disk.
If your disk is a "spinning disk" (e.g. HDD) then this can hurt your I/O
bandwidth.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]