Re: Share some experiment results about Gorilla encoding algorithm

Xiangdong Huang Sun, 11 Oct 2020 21:06:58 -0700

Hi,

> I think we can change the name of the old Gorilla encoding to
TSEncoding.OLD_GORILLA in the code under the premise of ensuring the
compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for
the re-implemented version. This may minimize the impact on users.


I opt for this way. Old_Gorillia still can be serialized as "6". And then
we assign a new short value to the new gorilla.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Steve Su <[email protected]> 于2020年10月11日周日 下午11:53写道：

> Hi,
>
> From my point of view, since the reimplementation of this algorithm does
> not change the structure of TsFile, there is no need to upgrade the version
> number of TsFile to 000003.
>
> I think we can change the name of the old Gorilla encoding to
> TSEncoding.OLD_GORILLA in the code under the premise of ensuring the
> compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for
> the re-implemented version. This may minimize the impact on users.
>
> What do you think? :)
>
> Steve Su
>
> ------------------ 原始邮件 ------------------
> 发件人: "dev" <[email protected]>;
> 发送时间: 2020年10月10日(星期六) 晚上11:35
> 收件人: "dev"<[email protected]>;
> 主题: Re: Share some experiment results about Gorilla encoding algorithm
>
> Hi,
>
> Nice!
>
> One question. So, if we reimplement the Gorilla algorithm, how to consider
> the version compatibility?
>
> 1. Upgrade the TsFile version to 000003, or
> 2. Add a new encoding name to the corrected gorilla.
>
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Steve Su <[email protected]> 于2020年10月10日周六 下午10:20写道：
>
> > Hi,
> >
> > Recently, we realized that the Gorilla encoding algorithm that has been
> > used inside IoTDB may have some issues, because it will cause time series
> > data (the value part) to become more space-consuming after encoding. This
> > is not in line with expectations. Usually after using Gorilla encoding,
> the
> > data will take up less space.
> >
> > I found a very good open source Gorilla algorithm implementation by
> > Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> > compared the difference in encoding / decoding time cost and compression
> > rate between the version implemented by Michael and the version used
> > internally by IoTDB, and found that the version used inside IoTDB does
> have
> > a lot of room for improvement.
> >
> > See
> >
> https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> > for more experiment details.
> >
> > I think we can refer to Michael's implementation to re-implement the
> > algorithm inside IoTDB to reduce the compression rate (fix potential
> > errors) and improve performance. I have created a JIRA (see
> > https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible,
> I
> > would be happy to re-implement the algorithm.
> >
> > Thanks,
> > Steve Su

Re: Share some experiment results about Gorilla encoding algorithm

Reply via email to