We checked many levels like 3, 7, 10, 19. May be 1 or 2 more
I can retry the experiments.

Regards
Manik Singla
+91-9996008893
+91-9665639677

"Life doesn't consist in holding good cards but playing those you hold
well."


On Tue, Oct 22, 2019 at 1:49 AM Radev, Martin <[email protected]> wrote:

> Hello Manik,
>
>
> If the compression level is really propagated to the library, what
> compression levels did you check?
>
>
> Regards,
>
> Martin
> ------------------------------
> *From:* Manik Singla <[email protected]>
> *Sent:* Monday, October 21, 2019 10:11:36 PM
> *To:* Parquet Dev
> *Cc:* [email protected]; Radev, Martin
> *Subject:* Re: custom CompressionCodec support
>
> Yes, thats the flag we tried and ensured its getting read and propagated.
>
> Regards
> Manik Singla
> +91-9996008893
> +91-9665639677
>
> "Life doesn't consist in holding good cards but playing those you hold
> well."
>
>
> On Mon, Oct 21, 2019 at 12:51 PM Driesprong, Fokko <[email protected]>
> wrote:
>
>> Thanks Manik,
>>
>> Did you try setting the Hadoop io.compression.codec.zstd.level config?
>>
>> Cheers, Fokko
>>
>> Op za 19 okt. 2019 om 12:24 schreef Manik Singla <[email protected]>:
>>
>> > Hi Fokko and Martin
>> >
>> > We are using parquet-hadoop which support compressions from
>> parquet-format.
>> > In our case, we were getting same compression even after changing
>> > compression level of zstd.
>> > We confirmed that  set level is being passed by ZStandardCompressor in
>> init
>> > which is native call .
>> >
>> > To confirm the issue, we tried same by injecting own implementation of
>> zstd
>> > and that seem to work fine.
>> > We will have a look how its working for spark and not for us.
>> >
>> > Regards
>> > Manik Singla
>> > +91-9996008893
>> > +91-9665639677
>> >
>> > "Life doesn't consist in holding good cards but playing those you hold
>> > well."
>> >
>> >
>> > On Fri, Oct 18, 2019 at 5:20 PM Driesprong, Fokko <[email protected]
>> >
>> > wrote:
>> >
>> > > Hi Falak,
>> > >
>> > > I was able to set the compression level in Spark using
>> > > spark.io.compression.zstd.level.
>> > >
>> > > Cheers, Fokko
>> > >
>> > > Op do 17 okt. 2019 om 20:53 schreef Radev, Martin <
>> [email protected]>:
>> > >
>> > > > Hi Falak,
>> > > >
>> > > >
>> > > > I was one of the people who recently exposed this to Arrow but this
>> is
>> > > not
>> > > > part of the Parquet specification.
>> > > >
>> > > > In particular, any implementation for writing parquet files can
>> decide
>> > > > whether to expose this or select a reasonable value internally.
>> > > >
>> > > >
>> > > > If you're using Arrow, you would have to read the documentation of
>> the
>> > > > specified compressor. Arrow doesn't do checks for whether specified
>> > > > compression level is within the range of what's supported by the
>> codec.
>> > > For
>> > > > ZSTD, the range should be [1, 22].
>> > > >
>> > > > Let me know if you're using Arrow and I can check locally that there
>> > > isn't
>> > > > by any chance a bug with propagating the value. At the moment there
>> are
>> > > > only smoke tests that nothing crashes.
>> > > >
>> > > >
>> > > > Regards,
>> > > >
>> > > > Martin
>> > > > ------------------------------
>> > > > *From:* Falak Kansal <[email protected]>
>> > > > *Sent:* Thursday, October 17, 2019 4:43:54 PM
>> > > > *To:* Driesprong, Fokko
>> > > > *Cc:* [email protected]
>> > > > *Subject:* Re: custom CompressionCodec support
>> > > >
>> > > > Hi Fokko,
>> > > >
>> > > > Thanks for replying, yes sure.
>> > > > The problem we are facing is that with parquet zstd we are not able
>> to
>> > > > control the compression level, we tried setting different
>> compression
>> > > > levels but it doesn't make any difference in the size. We
>> tested/have
>> > > made
>> > > > sure that we are getting the same compression level in
>> > > > *ZStandardCompressor
>> > > > *as we are setting in the configuration file. Are we missing
>> something?
>> > > How
>> > > > can we set a different compression level of zstd? Help would be
>> > > > appreciated.
>> > > >
>> > > > Thanks
>> > > > Falak
>> > > >
>> > > > On Thu, Oct 17, 2019 at 7:47 PM Driesprong, Fokko
>> <[email protected]
>> > >
>> > > > wrote:
>> > > >
>> > > > > Hi Manik,
>> > > > >
>> > > > > The supported compression codecs that ship with Parquet are tested
>> > and
>> > > > > validated in the CI pipeline. Sometimes there are issues with
>> > > > compressors,
>> > > > > therefore they are not easily pluggable. Feel free to open up a
>> PR to
>> > > the
>> > > > > project if you believe if there are compressors missing, then we
>> can
>> > > > have a
>> > > > > discussion.
>> > > > >
>> > > > > It is part of the Thrift definition:
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/parquet-format/blob/37bdba0a18cff18da706a0d353c65e726c8edca6/src/main/thrift/parquet.thrift#L470-L478
>> > > > >
>> > > > > Hope this clarifies the design decision.
>> > > > >
>> > > > > Cheers, Fokko
>> > > > >
>> > > > > Op di 15 okt. 2019 om 11:52 schreef Manik Singla <
>> > [email protected]
>> > > >:
>> > > > >
>> > > > >> Hi
>> > > > >>
>> > > > >> Current java code is not open to use custom compressor.
>> > > > >> I believe mostly read/write is done by same team/company.  In
>> that
>> > > case,
>> > > > >> it
>> > > > >> would be beneficial to add this support that user can plug new
>> > > > compressor
>> > > > >> easily instead of doing local changes which will be prone to uses
>> > > across
>> > > > >> version upgrades.
>> > > > >>
>> > > > >> Do you guys think it will be worth to add
>> > > > >>
>> > > > >> Regards
>> > > > >> Manik Singla
>> > > > >> +91-9996008893
>> > > > >> +91-9665639677
>> > > > >>
>> > > > >> "Life doesn't consist in holding good cards but playing those you
>> > hold
>> > > > >> well."
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to