pitrou commented on issue #38389:
URL: https://github.com/apache/arrow/issues/38389#issuecomment-1777412295

   And now the file sizes:
   * with encoding RLE_DICTIONARY (the default):
   ```
   -rw-rw-r-- 1 antoine antoine 255224210 oct.  24 16:53 foo-lz4.parquet
   -rw-rw-r-- 1 antoine antoine 258035659 oct.  24 16:53 foo.parquet
   -rw-rw-r-- 1 antoine antoine 292773361 oct.  24 16:53 
foo-uncompressed.parquet
   ```
   * with encoding PLAIN:
   ```
   -rw-rw-r-- 1 antoine antoine 414232019 oct.  24 16:54 foo-lz4.parquet
   -rw-rw-r-- 1 antoine antoine 448751689 oct.  24 16:54 foo.parquet
   -rw-rw-r-- 1 antoine antoine 800112507 oct.  24 16:54 
foo-uncompressed.parquet
   ```
   * with encoding DELTA_BINARY_PACKED:
   ```
   -rw-rw-r-- 1 antoine antoine 228584003 oct.  24 16:55 foo-lz4.parquet
   -rw-rw-r-- 1 antoine antoine 227823921 oct.  24 16:55 foo.parquet
   -rw-rw-r-- 1 antoine antoine 227811357 oct.  24 16:55 
foo-uncompressed.parquet
   ```
   
   ... meaning that DELTA_BINARY_PACKED is probably a very good choice for 
integer columns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to