Hi, I don't know any performance or correctness problems with Zstd-JNI. It tracks very closely the upstream (the native part) and tries to expose most of the functionality. Regarding streaming interfaces, assuming that you are going to use them, there are currently 2 approaches:
- ZstdInputStream/ZstdOutputStream filters that decompress/compress streams, similar to the Gzip implementation from the standard library. - variants that work with direct buffers. If it fits with how your code is structured, it may be slightly faster. If you have any specific questions, please let me know. Also you can send me your PR when it's ready so I may have suggestions. BTW, it's strange Hadoop decided to reimplement it their own way. The rest of the ecosystem is using Zstd-JNI, e.g. Spark, Flink, Cassandra, etc. Regards, luben On Thu, May 21, 2020 at 2:34 AM Xinli shang <[email protected]> wrote: > Hi all, > > I see parquet-mr has been using ZSTD-JNI > <https://github.com/luben/zstd-jni>for the parquet-cli > <https://github.com/apache/parquet-mr/blob/master/parquet-cli/pom.xml#L48> > project. It is a clean approach to use this JNI for testing ZSTD instead of > using Hadoop implementation, especially when testing in localhost. I am > wondering maybe we can promote it to parquet-hadoop project as ZSTD > becomes more and more popular. I have a prototype working but I would like > to ask if anybody knows any issues (performance, reliability etc) of > ZSTD-JNI <https://github.com/luben/zstd-jni>? It is welcome to share any > feedback on using this JNI. > > BTW, I am also trying out the AirCompressor > <https://github.com/airlift/aircompressor> approach, but it seems the > ZSTD compression level is not adjustable. > > -- > Xinli Shang >
