Re: Share some experiment results about Gorilla encoding algorithm

Steve Su Mon, 12 Oct 2020 05:32:53 -0700

Hi,

> Could we change the default time encoding from TS2_DIFF to Gorilla and keep 
> compatible?
Yes.


In the version implemented by Michael, the time encoding is essentially a Delta 
of Delta encoding (similar to TS2_DIFF, but with some improvements). We can 
reimplement TS2_DIFF based on Michael's implementation and name the two 
encodings TS2_DIFF_V1 and TS2_DIFF_V2.

Steve Su

------------------ ???????? ------------------
??????: "dev" <[email protected]>;
????????: 2020??10??12??(??????) ????2:32
??????: "dev"<[email protected]>;
????: Re: Share some experiment results about Gorilla encoding algorithm

Hi,

Maintaining two versions of gorilla encoding is ok.

Could we change the default time encoding from TS2_DIFF to Gorilla and keep 
compatible?

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

??????
???????? ????????

> -----????????-----
> ??????: "Steve Su" <[email protected]>
> ????????: 2020-10-11 23:52:55 (??????)
> ??????: dev <[email protected]>
> ????: 
> ????: Re: Share some experiment results about Gorilla encoding algorithm
> 
> Hi,
> 
> From my point of view, since the reimplementation of this algorithm does not 
> change the structure of TsFile, there is no need to upgrade the version 
> number of TsFile to 000003.
> 
> I think we can change the name of the old Gorilla encoding to 
> TSEncoding.OLD_GORILLA in the code under the premise of ensuring the 
> compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for the 
> re-implemented version. This may minimize the impact on users.
> 
> What do you think? :)
> 
> Steve Su
> 
> ------------------ ???????? ------------------
> ??????: "dev" <[email protected]>;
> ????????: 2020??10??10??(??????) ????11:35
> ??????: "dev"<[email protected]>;
> ????: Re: Share some experiment results about Gorilla encoding algorithm
> 
> Hi,
> 
> Nice!
> 
> One question. So, if we reimplement the Gorilla algorithm, how to consider
> the version compatibility?
> 
> 1. Upgrade the TsFile version to 000003, or
> 2. Add a new encoding name to the corrected gorilla.
> 
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
> 
>  ??????
> ???????? ????????
> 
> 
> Steve Su <[email protected]> ??2020??10??10?????? ????10:20??????
> 
> > Hi,
> >
> > Recently, we realized that the Gorilla encoding algorithm that has been
> > used inside IoTDB may have some issues, because it will cause time series
> > data (the value part) to become more space-consuming after encoding. This
> > is not in line with expectations. Usually after using Gorilla encoding, the
> > data will take up less space.
> >
> > I found a very good open source Gorilla algorithm implementation by
> > Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> > compared the difference in encoding / decoding time cost and compression
> > rate between the version implemented by Michael and the version used
> > internally by IoTDB, and found that the version used inside IoTDB does have
> > a lot of room for improvement.
> >
> > See
> > https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> > for more experiment details.
> >
> > I think we can refer to Michael's implementation to re-implement the
> > algorithm inside IoTDB to reduce the compression rate (fix potential
> > errors) and improve performance. I have created a JIRA (see
> > https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
> > would be happy to re-implement the algorithm.
> >
> > Thanks,
> > Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Reply via email to