[GitHub] [hudi] alexeykudinkin commented on pull request #4214: [HUDI-2928] Switching default Parquet's column encoding to zstd

GitBox Mon, 13 Dec 2021 17:07:13 -0800


alexeykudinkin commented on pull request #4214:
URL: https://github.com/apache/hudi/pull/4214#issuecomment-993059693



   @vinothchandar @codope 
   
   Unfortunately, the switching to Zstd might required a little more grinding 
than initially anticipated:
   
   Current Parquet version (1.10.1, being handed down by Spark 2.4.4) only 
supports `ZstdCompressionCodec` as provided by "hadoop-common", which in turn 
requires it to be built with Native Libraries support (including compression 
codecs, etc). It only supports Linux/*nix.
   
   Therefore if we're planning on supporting Spark 2.x we have following 
options: 
   
   Implement our own version of `ZstdCompressionCodec` leveraging either 
zstd-jni (used by Spark internally) or airlift-aircompressor (claims to be 
faster than JNI impl).
   Switch to `zstd` being default setting only for Spark 3 environments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on pull request #4214: [HUDI-2928] Switching default Parquet's column encoding to zstd

Reply via email to