blacha opened a new issue, #47085:
URL: https://github.com/apache/arrow/issues/47085

   ### Describe the enhancement requested
   
   The default compression level for zstd is set to level 1
   
   
https://github.com/apache/arrow/blob/45f562aabca3ecef6ee25a10960f54671c894a37/cpp/src/arrow/util/compression_internal.h#L74
   
   The other compression algorithims are generally much higher, 
   
   gzip level9 
https://github.com/apache/arrow/blob/45f562aabca3ecef6ee25a10960f54671c894a37/cpp/src/arrow/util/compression_internal.h#L47
   
   brotli level8 
https://github.com/apache/arrow/blob/45f562aabca3ecef6ee25a10960f54671c894a37/cpp/src/arrow/util/compression_internal.h#L34
   
   
   Could the default be changed to somewhere between 9 and 15? 
   
   Doing some rough comparisons with `gdal`
   
   gzip-l9, brotli-l8 take approx 5 seconds to convert a parquet file
   
   ```
   $ time gdal vector convert road.fgb road.br.parquet --lco compression=brotli 
--overwrite --lco compression_level=8
   
   real 0m6.831s
   
   $ time gdal vector convert road.fgb road.gzip.parquet --lco compression=gzip 
--overwrite --lco compression_level=9
   
   real 0m4.020s
   ```
   
   using the default level1 zstd it takes approx 500ms, upping to level 15 
gives approx the same compression speed as brotli
   
   ```
   $ time gdal vector convert road.fgb road.gzip.parquet --lco compression=zstd 
--overwrite
   
   real 0m0.423s
   
   $  time gdal vector convert road.fgb road.zstd.parquet --lco 
compression=zstd --overwrite --lco compression_level=15
   
   real 0m6.784s
   ```
   
   
   Decompression speed is meant to be approximately the same across all 
compression levels:
   
   > while decompression is uniformly fast, varying by less than 20% between 
the fastest and slowest levels [wikipedia](https://en.wikipedia.org/wiki/Zstd)
   
   There is this issue that suggests that level 15 could have better 
decompression speed https://github.com/facebook/zstd/issues/4091  
   
   
   Ref: https://github.com/OSGeo/gdal/issues/12750
   
   
   ### Component(s)
   
   Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to