I'm interested in this. I have been busy the last couple of weeks so have not been able to take a closer look. I will try to give some feedback this week.
Thanks On Tue, Sep 3, 2019, 2:17 PM Radev, Martin <[email protected]> wrote: > Hello all, > > > thank you Julien for the interest. > > > Could other people, part of Apache Parquet, share their opinions? > > Do you have your own data which you would like to use for testing the new > encoding? > > > Regards, > > Martin > > ________________________________ > From: Julien Le Dem <[email protected]> > Sent: Friday, August 30, 2019 2:38:37 AM > To: [email protected] > Cc: Raoofy, Amir; Karlstetter, Roman > Subject: Re: [VOTE] Add BYTE_STREAM_SPLIT encoding to Apache Parquet > > I think this looks promising to me. At first glance it seems combining > simplicity and efficiency. > I'd like to hear more from other members of the PMC. > > On Tue, Aug 27, 2019 at 5:30 AM Radev, Martin <[email protected]> wrote: > > > Dear all, > > > > > > there was some earlier discussion on adding a new encoding for better > > compression of FP32 and FP64 data. > > > > > > The pull request which extends the format is here: > > https://github.com/apache/parquet-format/pull/144 > > The change has one approval from earlier from Zoltan. > > > > > > The results from an investigation on compression ratio and speed with the > > new encoding vs other encodings is available here: > > https://github.com/martinradev/arrow-fp-compression-bench > > It is visible that for many tests the new encoding performs better in > > compression ratio and in some cases in speed. The improvements in > > compression speed come from the fact that the new format can potentially > > lead to a faster parsing for some compressors like GZIP. > > > > > > An earlier report which examines other FP compressors (fpzip, spdp, fpc, > > zfp, sz) and new potential encodings is available here: > > > https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view?usp=sharing > > The report also covers lossy compression but the BYTE_STREAM_SPLIT > > encoding only has the focus of lossless compression. > > > > > > Can we have a vote? > > > > > > Regards, > > > > Martin > > > > >
