I'm interested in this. I have been busy the last couple of weeks so have
not been able to take a closer look. I will try to give some feedback this
week.

Thanks

On Tue, Sep 3, 2019, 2:17 PM Radev, Martin <[email protected]> wrote:

> Hello all,
>
>
> thank you Julien for the interest.
>
>
> Could other people, part of Apache Parquet, share their opinions?
>
> Do you have your own data which you would like to use for testing the new
> encoding?
>
>
> Regards,
>
> Martin
>
> ________________________________
> From: Julien Le Dem <[email protected]>
> Sent: Friday, August 30, 2019 2:38:37 AM
> To: [email protected]
> Cc: Raoofy, Amir; Karlstetter, Roman
> Subject: Re: [VOTE] Add BYTE_STREAM_SPLIT encoding to Apache Parquet
>
> I think this looks promising to me. At first glance it seems combining
> simplicity and efficiency.
> I'd like to hear more from other members of the PMC.
>
> On Tue, Aug 27, 2019 at 5:30 AM Radev, Martin <[email protected]> wrote:
>
> > Dear all,
> >
> >
> > there was some earlier discussion on adding a new encoding for better
> > compression of FP32 and FP64 data.
> >
> >
> > The pull request which extends the format is here:
> > https://github.com/apache/parquet-format/pull/144
> > The change has one approval from earlier from Zoltan.
> >
> >
> > The results from an investigation on compression ratio and speed with the
> > new encoding vs other encodings is available here:
> > https://github.com/martinradev/arrow-fp-compression-bench
> > It is visible that for many tests the new encoding performs better in
> > compression ratio and in some cases in speed. The improvements in
> > compression speed come from the fact that the new format can potentially
> > lead to a faster parsing for some compressors like GZIP.
> >
> >
> > An earlier report which examines other FP compressors (fpzip, spdp, fpc,
> > zfp, sz) and new potential encodings is available here:
> >
> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view?usp=sharing
> > The report also covers lossy compression but the BYTE_STREAM_SPLIT
> > encoding only has the focus of lossless compression.
> >
> >
> > Can we have a vote?
> >
> >
> > Regards,
> >
> > Martin
> >
> >
>

Reply via email to