Steve Yurong Su created IOTDB-938:
-------------------------------------
Summary: Re-implement Gorilla encoding algorithm
Key: IOTDB-938
URL: https://issues.apache.org/jira/browse/IOTDB-938
Project: Apache IoTDB
Issue Type: Improvement
Components: Core/TsFile
Reporter: Steve Yurong Su
Recently, we realized that the Gorilla encoding algorithm that has been used
inside IoTDB may have some issues, because it will cause time series data (the
value part) to become more space-consuming after encoding. This is not in line
with expectations. Usually after using Gorilla encoding, the data will take up
less space.
I found a very good open source Gorilla algorithm implementation by Michael on
Github (see https://github.com/burmanm/gorilla-tsc). I compared the difference
in encoding / decoding time cost and compression rate between the version
implemented by Michael and the version used internally by IoTDB, and found that
the version used inside IoTDB does have a lot of room for improvement.
See
https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
for more experiment details.
I think we can refer to Michael's implementation to re-implement the algorithm
inside IoTDB to reduce the compression rate (fix potential errors) and improve
performance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)