I think this looks promising to me. At first glance it seems combining
simplicity and efficiency.
I'd like to hear more from other members of the PMC.

On Tue, Aug 27, 2019 at 5:30 AM Radev, Martin <martin.ra...@tum.de> wrote:

> Dear all,
>
>
> there was some earlier discussion on adding a new encoding for better
> compression of FP32 and FP64 data.
>
>
> The pull request which extends the format is here:
> https://github.com/apache/parquet-format/pull/144
> The change has one approval from earlier from Zoltan.
>
>
> The results from an investigation on compression ratio and speed with the
> new encoding vs other encodings is available here:
> https://github.com/martinradev/arrow-fp-compression-bench
> It is visible that for many tests the new encoding performs better in
> compression ratio and in some cases in speed. The improvements in
> compression speed come from the fact that the new format can potentially
> lead to a faster parsing for some compressors like GZIP.
>
>
> An earlier report which examines other FP compressors (fpzip, spdp, fpc,
> zfp, sz) and new potential encodings is available here:
> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view?usp=sharing
> The report also covers lossy compression but the BYTE_STREAM_SPLIT
> encoding only has the focus of lossless compression.
>
>
> Can we have a vote?
>
>
> Regards,
>
> Martin
>
>

Reply via email to