Hi, On Tue, Apr 25, 2017 at 4:06 AM, Adrien Grand <[email protected]> wrote:
> I think it makes sense indeed for time-series databases. The time field > should grow by regular increments, and numerical values of consecutive > documents are likely to be close to each other. Both cases should compress > efficiently by doing delta of delta encoding. > > We haven't really started exploring leveraging the fact that doc values > have an iterator API for compression at all. I think this delta-of-delta > approach would be interesting to explore. Maybe we could encode values in > blocks like postings and decide how to encode each block based on the > actual data. Delta-of-delta would be one option, but sometimes we might > also go with RLE or FOR depending on which one suits the actual data best. > Sounds great! I created https://issues.apache.org/jira/browse/LUCENE-7806 Otis -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > Le mar. 25 avr. 2017 à 04:43, Otis Gospodnetić <[email protected]> > a écrit : > >> Hi, >> >> I was reading about Facebook Beringei when I spotted this: >> >> >> - Extremely efficient streaming compression algorithm. Our streaming >> compression algorithm is able to compress real world time series data by >> over 90%. The delta of delta compression algorithm used by Beringei is >> also >> fast - we see that a single machine is able to compress more than 1.5 >> million datapoints/second. >> >> >> That "*delta of delta*" caught my attention.... This delta of delta >> encoding is one of the Facebook Gorilla tricks that allows it to compress >> 16 bytes into 1.37 bytes on average -- see section 4.1 that describes it -- >> http://www.vldb.org/pvldb/vol8/p1816-teller.pdf >> >> This seems to be aimed at both time fields and numerical values. >> >> Would Lucene benefit from this? >> >> https://github.com/burmanm/gorilla-tsc seems to be a fresh Java >> implementation. >> >> Otis >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >>
