Share some experiment results about Gorilla encoding algorithm

Steve Su Sat, 10 Oct 2020 07:21:33 -0700

Hi,

Recently, we realized that the Gorilla encoding algorithm that has been used 
inside IoTDB may have some issues, because it will cause time series data (the 
value part) to become more space-consuming after encoding. This is not in line 
with expectations. Usually after using Gorilla encoding, the data will take up 
less space.


I found a very good open source Gorilla algorithm implementation by Michael on 
Github (see https://github.com/burmanm/gorilla-tsc). I compared the difference 
in encoding / decoding time cost and compression rate between the version 
implemented by Michael and the version used internally by IoTDB, and found that 
the version used inside IoTDB does have a lot of room for improvement.

See 
https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm 
for more experiment details.

I think we can refer to Michael's implementation to re-implement the algorithm 
inside IoTDB to reduce the compression rate (fix potential errors) and improve 
performance. I have created a JIRA (see 
https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I would 
be happy to re-implement the algorithm.

Thanks,
Steve Su

Share some experiment results about Gorilla encoding algorithm

Reply via email to