Hi,

Recently, we realized that the Gorilla encoding algorithm that has been used 
inside IoTDB may have some issues, because it will cause time series data (the 
value part) to become more space-consuming after encoding. This is not in line 
with expectations. Usually after using Gorilla encoding, the data will take up 
less space.

I found a very good open source Gorilla algorithm implementation by Michael on 
Github (see https://github.com/burmanm/gorilla-tsc). I compared the difference 
in encoding / decoding time cost and compression rate between the version 
implemented by Michael and the version used internally by IoTDB, and found that 
the version used inside IoTDB does have a lot of room for improvement.

See 
https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm 
for more experiment details.

I think we can refer to Michael's implementation to re-implement the algorithm 
inside IoTDB to reduce the compression rate (fix potential errors) and improve 
performance. I have created a JIRA (see 
https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I would 
be happy to re-implement the algorithm.

Thanks,
Steve Su

Reply via email to