pitrou commented on issue #38389:
URL: https://github.com/apache/arrow/issues/38389#issuecomment-1777391943

   Ok, so after investigating this, one factor is that `pq.write_table` by 
default tries to use dictionary encoding, and the heuristic to stop using 
dictionary encoding is when the dictionary page size reaches the 
`dictionary_pagesize_limit` parameter, which is set to a very high default of 
1MB.
   
   However, the benchmark numbers are also **largely skewed** by the fact that 
the MB/s figure is computed relatively to the _compressed and encoded_ size, 
not on the final in-memory size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to