Hi, > Could we change the default time encoding from TS2_DIFF to Gorilla and keep > compatible? Yes.
In the version implemented by Michael, the time encoding is essentially a Delta of Delta encoding (similar to TS2_DIFF, but with some improvements). We can reimplement TS2_DIFF based on Michael's implementation and name the two encodings TS2_DIFF_V1 and TS2_DIFF_V2. Steve Su ------------------ ???????? ------------------ ??????: "dev" <[email protected]>; ????????: 2020??10??12??(??????) ????2:32 ??????: "dev"<[email protected]>; ????: Re: Share some experiment results about Gorilla encoding algorithm Hi, Maintaining two versions of gorilla encoding is ok. Could we change the default time encoding from TS2_DIFF to Gorilla and keep compatible? Thanks, -- Jialin Qiao School of Software, Tsinghua University ?????? ???????? ???????? > -----????????----- > ??????: "Steve Su" <[email protected]> > ????????: 2020-10-11 23:52:55 (??????) > ??????: dev <[email protected]> > ????: > ????: Re: Share some experiment results about Gorilla encoding algorithm > > Hi, > > From my point of view, since the reimplementation of this algorithm does not > change the structure of TsFile, there is no need to upgrade the version > number of TsFile to 000003. > > I think we can change the name of the old Gorilla encoding to > TSEncoding.OLD_GORILLA in the code under the premise of ensuring the > compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for the > re-implemented version. This may minimize the impact on users. > > What do you think? :) > > Steve Su > > ------------------ ???????? ------------------ > ??????: "dev" <[email protected]>; > ????????: 2020??10??10??(??????) ????11:35 > ??????: "dev"<[email protected]>; > ????: Re: Share some experiment results about Gorilla encoding algorithm > > Hi, > > Nice! > > One question. So, if we reimplement the Gorilla algorithm, how to consider > the version compatibility? > > 1. Upgrade the TsFile version to 000003, or > 2. Add a new encoding name to the corrected gorilla. > > Best, > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > ?????? > ???????? ???????? > > > Steve Su <[email protected]> ??2020??10??10?????? ????10:20?????? > > > Hi, > > > > Recently, we realized that the Gorilla encoding algorithm that has been > > used inside IoTDB may have some issues, because it will cause time series > > data (the value part) to become more space-consuming after encoding. This > > is not in line with expectations. Usually after using Gorilla encoding, the > > data will take up less space. > > > > I found a very good open source Gorilla algorithm implementation by > > Michael on Github (see https://github.com/burmanm/gorilla-tsc). I > > compared the difference in encoding / decoding time cost and compression > > rate between the version implemented by Michael and the version used > > internally by IoTDB, and found that the version used inside IoTDB does have > > a lot of room for improvement. > > > > See > > https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm > > for more experiment details. > > > > I think we can refer to Michael's implementation to re-implement the > > algorithm inside IoTDB to reduce the compression rate (fix potential > > errors) and improve performance. I have created a JIRA (see > > https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I > > would be happy to re-implement the algorithm. > > > > Thanks, > > Steve Su
