Thank you so much Luben! Here <https://github.com/apache/parquet-mr/pull/793> is the PR. Please have a look!
On Wed, May 20, 2020 at 6:51 PM Любен <[email protected]> wrote: > Hi, > > I don't know any performance or correctness problems with Zstd-JNI. It > tracks very closely the upstream (the native part) and tries to expose most > of the functionality. Regarding streaming interfaces, assuming that you are > going to use them, there are currently 2 approaches: > > - ZstdInputStream/ZstdOutputStream filters that decompress/compress > streams, similar to the Gzip implementation from the standard library. > - variants that work with direct buffers. If it fits with how your code is > structured, it may be slightly faster. > > If you have any specific questions, please let me know. Also you can send > me your PR when it's ready so I may have suggestions. > > BTW, it's strange Hadoop decided to reimplement it their own way. The rest > of the ecosystem is using Zstd-JNI, e.g. Spark, Flink, Cassandra, etc. > > Regards, > luben > > > > > On Thu, May 21, 2020 at 2:34 AM Xinli shang <[email protected]> wrote: > >> Hi all, >> >> I see parquet-mr has been using ZSTD-JNI >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_luben_zstd-2Djni&d=DwMFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=FQ88AmOZ4TMjDdqNBGu-ag&m=OwMxoSaxdP-kXD9aHpK8orXERL4hJVC5SqNa9Qvd6ek&s=LO0yXYHXoWUpVFKpuvUoJi5BVOiE7AH8ItThuc0PCZw&e=>for >> the parquet-cli >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_parquet-2Dmr_blob_master_parquet-2Dcli_pom.xml-23L48&d=DwMFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=FQ88AmOZ4TMjDdqNBGu-ag&m=OwMxoSaxdP-kXD9aHpK8orXERL4hJVC5SqNa9Qvd6ek&s=pbMGYR8ZDFJ5C-a0nZuZ_RfZorwmmRJfuLx8SlHiIJg&e=> >> project. It is a clean approach to use this JNI for testing ZSTD instead of >> using Hadoop implementation, especially when testing in localhost. I am >> wondering maybe we can promote it to parquet-hadoop project as ZSTD >> becomes more and more popular. I have a prototype working but I would like >> to ask if anybody knows any issues (performance, reliability etc) of >> ZSTD-JNI >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_luben_zstd-2Djni&d=DwMFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=FQ88AmOZ4TMjDdqNBGu-ag&m=OwMxoSaxdP-kXD9aHpK8orXERL4hJVC5SqNa9Qvd6ek&s=LO0yXYHXoWUpVFKpuvUoJi5BVOiE7AH8ItThuc0PCZw&e=>? >> It is welcome to share any feedback on using this JNI. >> >> BTW, I am also trying out the AirCompressor >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_airlift_aircompressor&d=DwMFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=FQ88AmOZ4TMjDdqNBGu-ag&m=OwMxoSaxdP-kXD9aHpK8orXERL4hJVC5SqNa9Qvd6ek&s=AWRDbQ7XL7can-3rUwioL-QGc5r_jQpzpE86RmQuUq8&e=> >> approach, >> but it seems the ZSTD compression level is not adjustable. >> >> -- >> Xinli Shang >> > -- Xinli Shang
