Re: ZSTD-JNI

Любен Thu, 21 May 2020 17:14:32 -0700

Hi,

I don't know any performance or correctness problems with Zstd-JNI. It
tracks very closely the upstream (the native part) and tries to expose most
of the functionality. Regarding streaming interfaces, assuming that you are
going to use them,  there are currently 2 approaches:


- ZstdInputStream/ZstdOutputStream filters that decompress/compress
streams, similar to the Gzip implementation from the standard library.
- variants that work with direct buffers. If it fits with how your code is
structured, it may be slightly faster.

If you have any specific questions, please let me know. Also you can send
me your PR when it's ready so I may have suggestions.

BTW, it's strange Hadoop decided to reimplement it their own way. The rest
of the ecosystem is using Zstd-JNI, e.g. Spark, Flink, Cassandra, etc.

Regards,
luben




On Thu, May 21, 2020 at 2:34 AM Xinli shang <[email protected]> wrote:

> Hi all,
>
> I see parquet-mr has been using ZSTD-JNI
> <https://github.com/luben/zstd-jni>for the parquet-cli
> <https://github.com/apache/parquet-mr/blob/master/parquet-cli/pom.xml#L48>
> project. It is a clean approach to use this JNI for testing ZSTD instead of
> using Hadoop implementation, especially when testing in localhost. I am
> wondering maybe we can promote it to parquet-hadoop project as ZSTD
> becomes more and more popular. I have a prototype working but I would like
> to ask if anybody knows any issues (performance, reliability etc) of
> ZSTD-JNI <https://github.com/luben/zstd-jni>? It is welcome to share any
> feedback on using this JNI.
>
> BTW, I am also trying out the AirCompressor
> <https://github.com/airlift/aircompressor> approach, but it seems the
> ZSTD compression level is not adjustable.
>
> --
> Xinli Shang
>

Re: ZSTD-JNI

Reply via email to