pitrou commented on issue #38389: URL: https://github.com/apache/arrow/issues/38389#issuecomment-1777391943
Ok, so after investigating this, one factor is that `pq.write_table` by default tries to use dictionary encoding, and the heuristic to stop using dictionary encoding is when the dictionary page size reaches the `dictionary_pagesize_limit` parameter, which is set to a very high default of 1MB. However, the benchmark numbers are also **largely skewed** by the fact that the MB/s figure is computed relatively to the _compressed and encoded_ size, not on the final in-memory size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
