Re: Delta of delta encoding

Adrien Grand Tue, 25 Apr 2017 01:06:40 -0700

I think it makes sense indeed for time-series databases. The time field
should grow by regular increments, and numerical values of consecutive
documents are likely to be close to each other. Both cases should compress
efficiently by doing delta of delta encoding.


We haven't really started exploring leveraging the fact that doc values
have an iterator API for compression at all. I think this delta-of-delta
approach would be interesting to explore. Maybe we could encode values in
blocks like postings and decide how to encode each block based on the
actual data. Delta-of-delta would be one option, but sometimes we might
also go with RLE or FOR depending on which one suits the actual data best.

Le mar. 25 avr. 2017 à 04:43, Otis Gospodnetić <[email protected]>
a écrit :

> Hi,
>
> I was reading about Facebook Beringei when I spotted this:
>
>
>    - Extremely efficient streaming compression algorithm. Our streaming
>    compression algorithm is able to compress real world time series data by
>    over 90%. The delta of delta compression algorithm used by Beringei is also
>    fast - we see that a single machine is able to compress more than 1.5
>    million datapoints/second.
>
>
> That "*delta of delta*" caught my attention.... This delta of delta
> encoding is one of the Facebook Gorilla tricks that allows it to compress
> 16 bytes into 1.37 bytes on average -- see section 4.1 that describes it --
> http://www.vldb.org/pvldb/vol8/p1816-teller.pdf
>
> This seems to be aimed at both time fields and numerical values.
>
> Would Lucene benefit from this?
>
> https://github.com/burmanm/gorilla-tsc seems to be a fresh Java
> implementation.
>
> Otis
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>

Re: Delta of delta encoding

Reply via email to