Re: Share some experiment results about Gorilla encoding algorithm

Steve Su Mon, 12 Oct 2020 05:04:54 -0700

Hi,

I totally agree with Chris.


We can use TSEncoding.GORILLA_V1 and TSEncoding.GORILLA_V2 to represent the two 
versions of Gorilla algorithm implementation. When the user specifies Gorilla 
encoding to create a time series, we can always select the latest version of 
the encoding for the user.

Steve Su

------------------ ???????? ------------------
??????: "dev" <[email protected]>;
????????: 2020??10??12??(??????) ????2:52
??????: "dev-iotdb"<[email protected]>;
????: Re: Share some experiment results about Gorilla encoding algorithm

Hi,

Version number +1

When I was doing the tsfile upgrading module, I changed a lot of javadoc about 
Old and New TsFile, which made me so confused, to v1 and v2.

Thanks,

Haonan Hou

> On Oct 12, 2020, at 2:42 PM, Jialin Qiao <[email protected]> wrote:
> 
> Hi,
> 
> +1 for version number :)
> 
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
> 
> ??????
> ???????? ????????
> 
>> -----????????-----
>> ??????: "Christofer Dutz" <[email protected]>
>> ????????: 2020-10-12 14:38:34 (??????)
>> ??????: "[email protected]" <[email protected]>
>> ????: 
>> ????: Re: Share some experiment results about Gorilla encoding algorithm
>> 
>> Whatever you do : don't call anything "old" or "new".
>> 
>> In two years the new "new" might be the new "old"... What happens then?... 
>> Append version numbers... That's sustainable...
>> 
>> Chris
>> ________________________________
>> Von: Jialin Qiao <[email protected]>
>> Gesendet: Montag, 12. Oktober 2020 08:32
>> An: [email protected] <[email protected]>
>> Betreff: Re: Share some experiment results about Gorilla encoding algorithm
>> 
>> Hi,
>> 
>> Maintaining two versions of gorilla encoding is ok.
>> 
>> Could we change the default time encoding from TS2_DIFF to Gorilla and keep 
>> compatible?
>> 
>> Thanks,
>> --
>> Jialin Qiao
>> School of Software, Tsinghua University
>> 
>> ??????
>> ???????? ????????
>> 
>>> -----????????-----
>>> ??????: "Steve Su" <[email protected]>
>>> ????????: 2020-10-11 23:52:55 (??????)
>>> ??????: dev <[email protected]>
>>> ????:
>>> ????: Re: Share some experiment results about Gorilla encoding algorithm
>>> 
>>> Hi,
>>> 
>>> From my point of view, since the reimplementation of this algorithm does 
>>> not change the structure of TsFile, there is no need to upgrade the version 
>>> number of TsFile to 000003.
>>> 
>>> I think we can change the name of the old Gorilla encoding to 
>>> TSEncoding.OLD_GORILLA in the code under the premise of ensuring the 
>>> compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for 
>>> the re-implemented version. This may minimize the impact on users.
>>> 
>>> What do you think? :)
>>> 
>>> Steve Su
>>> 
>>> ------------------ ???????? ------------------
>>> ??????: "dev" <[email protected]>;
>>> ????????: 2020??10??10??(??????) ????11:35
>>> ??????: "dev"<[email protected]>;
>>> ????: Re: Share some experiment results about Gorilla encoding algorithm
>>> 
>>> Hi,
>>> 
>>> Nice!
>>> 
>>> One question. So, if we reimplement the Gorilla algorithm, how to consider
>>> the version compatibility?
>>> 
>>> 1. Upgrade the TsFile version to 000003, or
>>> 2. Add a new encoding name to the corrected gorilla.
>>> 
>>> Best,
>>> -----------------------------------
>>> Xiangdong Huang
>>> School of Software, Tsinghua University
>>> 
>>> ??????
>>> ???????? ????????
>>> 
>>> 
>>> Steve Su <[email protected]> ??2020??10??10?????? ????10:20??????
>>> 
>>>> Hi,
>>>> 
>>>> Recently, we realized that the Gorilla encoding algorithm that has been
>>>> used inside IoTDB may have some issues, because it will cause time series
>>>> data (the value part) to become more space-consuming after encoding. This
>>>> is not in line with expectations. Usually after using Gorilla encoding, the
>>>> data will take up less space.
>>>> 
>>>> I found a very good open source Gorilla algorithm implementation by
>>>> Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
>>>> compared the difference in encoding / decoding time cost and compression
>>>> rate between the version implemented by Michael and the version used
>>>> internally by IoTDB, and found that the version used inside IoTDB does have
>>>> a lot of room for improvement.
>>>> 
>>>> See
>>>> https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
>>>> for more experiment details.
>>>> 
>>>> I think we can refer to Michael's implementation to re-implement the
>>>> algorithm inside IoTDB to reduce the compression rate (fix potential
>>>> errors) and improve performance. I have created a JIRA (see
>>>> https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
>>>> would be happy to re-implement the algorithm.
>>>> 
>>>> Thanks,
>>>> Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Reply via email to