Hi, I totally agree with Chris.
We can use TSEncoding.GORILLA_V1 and TSEncoding.GORILLA_V2 to represent the two versions of Gorilla algorithm implementation. When the user specifies Gorilla encoding to create a time series, we can always select the latest version of the encoding for the user. Steve Su ------------------ ???????? ------------------ ??????: "dev" <[email protected]>; ????????: 2020??10??12??(??????) ????2:52 ??????: "dev-iotdb"<[email protected]>; ????: Re: Share some experiment results about Gorilla encoding algorithm Hi, Version number +1 When I was doing the tsfile upgrading module, I changed a lot of javadoc about Old and New TsFile, which made me so confused, to v1 and v2. Thanks, Haonan Hou > On Oct 12, 2020, at 2:42 PM, Jialin Qiao <[email protected]> wrote: > > Hi, > > +1 for version number :) > > Thanks, > -- > Jialin Qiao > School of Software, Tsinghua University > > ?????? > ???????? ???????? > >> -----????????----- >> ??????: "Christofer Dutz" <[email protected]> >> ????????: 2020-10-12 14:38:34 (??????) >> ??????: "[email protected]" <[email protected]> >> ????: >> ????: Re: Share some experiment results about Gorilla encoding algorithm >> >> Whatever you do : don't call anything "old" or "new". >> >> In two years the new "new" might be the new "old"... What happens then?... >> Append version numbers... That's sustainable... >> >> Chris >> ________________________________ >> Von: Jialin Qiao <[email protected]> >> Gesendet: Montag, 12. Oktober 2020 08:32 >> An: [email protected] <[email protected]> >> Betreff: Re: Share some experiment results about Gorilla encoding algorithm >> >> Hi, >> >> Maintaining two versions of gorilla encoding is ok. >> >> Could we change the default time encoding from TS2_DIFF to Gorilla and keep >> compatible? >> >> Thanks, >> -- >> Jialin Qiao >> School of Software, Tsinghua University >> >> ?????? >> ???????? ???????? >> >>> -----????????----- >>> ??????: "Steve Su" <[email protected]> >>> ????????: 2020-10-11 23:52:55 (??????) >>> ??????: dev <[email protected]> >>> ????: >>> ????: Re: Share some experiment results about Gorilla encoding algorithm >>> >>> Hi, >>> >>> From my point of view, since the reimplementation of this algorithm does >>> not change the structure of TsFile, there is no need to upgrade the version >>> number of TsFile to 000003. >>> >>> I think we can change the name of the old Gorilla encoding to >>> TSEncoding.OLD_GORILLA in the code under the premise of ensuring the >>> compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for >>> the re-implemented version. This may minimize the impact on users. >>> >>> What do you think? :) >>> >>> Steve Su >>> >>> ------------------ ???????? ------------------ >>> ??????: "dev" <[email protected]>; >>> ????????: 2020??10??10??(??????) ????11:35 >>> ??????: "dev"<[email protected]>; >>> ????: Re: Share some experiment results about Gorilla encoding algorithm >>> >>> Hi, >>> >>> Nice! >>> >>> One question. So, if we reimplement the Gorilla algorithm, how to consider >>> the version compatibility? >>> >>> 1. Upgrade the TsFile version to 000003, or >>> 2. Add a new encoding name to the corrected gorilla. >>> >>> Best, >>> ----------------------------------- >>> Xiangdong Huang >>> School of Software, Tsinghua University >>> >>> ?????? >>> ???????? ???????? >>> >>> >>> Steve Su <[email protected]> ??2020??10??10?????? ????10:20?????? >>> >>>> Hi, >>>> >>>> Recently, we realized that the Gorilla encoding algorithm that has been >>>> used inside IoTDB may have some issues, because it will cause time series >>>> data (the value part) to become more space-consuming after encoding. This >>>> is not in line with expectations. Usually after using Gorilla encoding, the >>>> data will take up less space. >>>> >>>> I found a very good open source Gorilla algorithm implementation by >>>> Michael on Github (see https://github.com/burmanm/gorilla-tsc). I >>>> compared the difference in encoding / decoding time cost and compression >>>> rate between the version implemented by Michael and the version used >>>> internally by IoTDB, and found that the version used inside IoTDB does have >>>> a lot of room for improvement. >>>> >>>> See >>>> https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm >>>> for more experiment details. >>>> >>>> I think we can refer to Michael's implementation to re-implement the >>>> algorithm inside IoTDB to reduce the compression rate (fix potential >>>> errors) and improve performance. I have created a JIRA (see >>>> https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I >>>> would be happy to re-implement the algorithm. >>>> >>>> Thanks, >>>> Steve Su
