Hi Chris,

You're right. IoTDB does use ZigZag encoding for variable length
signed integers. And as you said, if the number is negative, it is
x-ored with all bits set to 1, so this is identical to flipping the
bits.

Shifting left the negative int32 value by one and then flipping all
tthe bits is just what ZigZag encoding want.


Best,
---------------
Yuan Tian

On Fri, Jun 17, 2022 at 6:55 PM Christofer Dutz
<[email protected]> wrote:
>
> Hi Xiangdong,
>
> I doubt you invented a new encoding form. So, in general, I was asking which 
> form this actually is.
> Juilian already pointed out that bit of code.
>
> So, as I can see it, the sign information is in the least significant bit. 
> This would usually be an indicator for ZigZag encoding. The only part I don’t 
> quite understand, is the bit-flipping in case of negative values. In case of 
> ZigZag encoding, the value would be shift left by one and the last bit would 
> be set as the new first bit (So effectively the last bit would just be 
> rotated to become the first). In IoTDB it seems as if the left-shifted value 
> is inverted. Don’t quite understand why that is happening. I could imagine 
> that for small negative integers (small as in “close to 0”) the 2s complement 
> notation has many 1s, therefore it would consume a lot of memory in 
> serialized form. So, flipping the entire number would get rid of these 1s and 
> hence reduce the size of the serialized form.
>
> But going though this document again: 
> https://golb.hplar.ch/2019/06/variable-length-int-java.html
>
> If the number is negative, it is x-ored with all bits set to 1 … so this is 
> identical to flipping the bits … this is actually really cool and efficient.
>
> So, I would like to confirm that IoTDB uses ZigZag encoding for variable 
> length signed integers. Possibly a comment to the utils class to which 
> encoding is actually used, would be a great addition. I’ll probably add one 
> asap.
>
> Chris
>
>
>
>
> From: Xiangdong Huang <[email protected]>
> Sent: Freitag, 17. Juni 2022 09:33
> To: dev <[email protected]>; Yuan Tian <[email protected]>
> Subject: Re: Var-Length-Numeric encoding?
>
> Hi,
>
> I think the encoding implementation is in 
> src/main/java/org/apache/iotdb/tsfile/utils/ReadWriteForEncodingUtils.java
> @Yuan Tian<mailto:[email protected]>  implemented it.
>
> Best,
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Julian Feinauer 
> <[email protected]<mailto:[email protected]>> 
> 于2022年6月13日周一 17:47写道:
> Hi,
>
> I can only comment on floating points: we dont.
> Currently we also only have var-length encoding vor u32 (not for u64).
>
> Regarding ZigZag Encoding perhaps anybody else can jump in here?
>
> Julian
>
> Julian Feinauer
> Geschäftsführer/CEO
>
> [email protected]<mailto:%7BE-mail%7D>
> +49 (0) 7021 87868-01<tel:+49%20(0)%207021%2087868-01> |
> Jesinger Str. 57, 73230 Kirchheim unter Teck
> www.pragmaticindustries.de<https://pragmaticindustries.com/>
>
> [cid:1817091c10b45ac8cae1]       [cid:1817091c10b6373642a2] 
> <https://www.linkedin.com/company/pragmatic-industries-gmbh/>  
> [cid:1817091c10b5017b7993] <https://twitter.com/pragmaticindus1>  
> [cid:1817091c10b32bee5404] 
> <https://www.facebook.com/Pragmatic-industries-GmbH-102791535422112>  
> [cid:1817091c10b8dea4c1d5] <https://www.instagram.com/pragmaticindustries/>
> Pflichtinformationen<https://pragmaticindustries.com/datenschutzerklaerung/>  
> gemäß Artikel 13 DSGVO
> Von: Christofer Dutz 
> <[email protected]<mailto:[email protected]>>
> Datum: Montag, 13. Juni 2022 um 09:50
> An: [email protected]<mailto:[email protected]> 
> <[email protected]<mailto:[email protected]>>
> Betreff: Var-Length-Numeric encoding?
> Hi all,
>
> Just out of curiosity. Julian told me TSFiles make use of variable length 
> encoding of numeric types.
> I would expect the encoding for unsigned integers to be the "ordinary" one 
> where 7 bits of a byte are being used for encoding the numeric value and new 
> bytes are added as long as the first bit is 1.
> However, I would be interested in which encoding is being used for unsigned 
> integers? Julian posted a reply in the #iotdb slack channel, but I'm unsure 
> which official encoding type this is.
> It most likely looks like ZigZag Encoding, but I'm a bit unsure if it really 
> is.
> Could anyone here please shed a bit of lite on this? And do we have 
> var-length encoding for floating-point types too?
>
> Chris

Reply via email to