Hi, I was reading about Facebook Beringei when I spotted this:
- Extremely efficient streaming compression algorithm. Our streaming compression algorithm is able to compress real world time series data by over 90%. The delta of delta compression algorithm used by Beringei is also fast - we see that a single machine is able to compress more than 1.5 million datapoints/second. That "*delta of delta*" caught my attention.... This delta of delta encoding is one of the Facebook Gorilla tricks that allows it to compress 16 bytes into 1.37 bytes on average -- see section 4.1 that describes it -- http://www.vldb.org/pvldb/vol8/p1816-teller.pdf This seems to be aimed at both time fields and numerical values. Would Lucene benefit from this? https://github.com/burmanm/gorilla-tsc seems to be a fresh Java implementation. Otis -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/
