[ 
https://issues.apache.org/jira/browse/THRIFT-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235545#comment-17235545
 ] 

Allen George edited comment on THRIFT-5299 at 11/19/20, 3:23 PM:
-----------------------------------------------------------------

II was curious about why this didn't break all the cross tests, so I took a 
look at the code again. I noticed that the *only* time the compact protocol 
spec uses (varint, non-zigzag) for a signed integer is for:

# collection, message, name sizes
# sequence numbers

It makes no sense for collection sizes and lengths to be negative, and indeed 
the spec states that in a number of places. For example: 

* {{byte length is the length of the byte array, using var int encoding (must 
be >= 0)}} (Binary encoding)
* {{name length is the byte length of the name field, a signed 32 bit integer 
encoded as a var int (must be >= 0).}} (Message)
* {{size is the size, a var int (int32), positive values 15 or higher}} (List 
and set)
* {{size is the size, a var int (int32), strictly positive values}} (Map)

Indeed, for sizes the rust implementation takes an {{i32}} and immediately 
casts it to a {{u32}} before encoding it onto the wire. I also checked 
{{integer-rs}}, and as expected unsigned ints are encoded in varint-only form. 
So, as far as the spec is concerned, for sizes the compact protocol is doing 
the right thing (there's a separate question as to why sizes should *ever* be 
negative in the library; I'll look into changing all sizes into {{u32}}, and 
what the blast radius of that change should be).

This leaves the sequence number, which is signed and unzigzagged as the only 
problem. I also checked, and I think there *may* be another issue with writing 
out {{i16}} integers. The protobuf varint spec says that {{i16}} should be 
written in no more bits than an {{i32}}; I'll have to check and see if that's 
the case.

TL;DR:

* Only the sequence number may be a problem here.
* I'll have to check if writing {{i16}} uses more bytes than necessary.
* Field ids and collection/binary/other sizes should be represented in the rust 
lib as {{usize}} or {{u##}} types instead of {{i##}} types.


was (Author: allengeorge):
II was curious about why this didn't break all the cross tests, so I took a 
look at the code again. I noticed that the *only* time the compact protocol 
spec uses (varint, non-zigzag) for a signed integer is for:

# collection, message, name sizes
# sequence numbers

It makes no sense for collection sizes and lengths to be negative, and indeed 
the spec states that in a number of places. For example: 

* {{byte length is the length of the byte array, using var int encoding (must 
be >= 0)}} (Binary encoding)
* {{name length is the byte length of the name field, a signed 32 bit integer 
encoded as a var int (must be >= 0).}} (Message)
* {{size is the size, a var int (int32), positive values 15 or higher}} (List 
and set)
* {{size is the size, a var int (int32), strictly positive values}} (Map)

Indeed, for sizes the rust implementation takes an {{i32}} and immediately 
casts it to a {{u32}} before encoding it onto the wire. I also checked 
{{integer-rs}}, and as expected unsigned ints are encoded in varint-only form. 
So, as far as the spec is concerned, for sizes the compact protocol is doing 
the right thing (there's a separate question as to why sizes should *ever* be 
negative in the library; I'll look into changing all sizes into {{u32}}, and 
what the blast radius of that change should be).

This leaves the sequence number, which is signed and unzigzagged as the only 
problem. I also checked, and I think there *may* be another issue with writing 
out {{i16}} integers. The protobuf varint spec says that {{i16}} should be 
written in no more bits than an {{i32}}; I'll have to check and see if that's 
the case.

> rs implementation compact protocol seq_id should not use zigzag encoding.
> -------------------------------------------------------------------------
>
>                 Key: THRIFT-5299
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5299
>             Project: Thrift
>          Issue Type: Bug
>          Components: Rust - Library
>    Affects Versions: 0.13.0
>            Reporter: shuo li
>            Priority: Major
>
> While reviewing code, I discovered a bug in compact protocol.
> The seq_id in message header is decoded with integer-encoding's i32, which 
> uses zigzag decode implicitly. 
>  
>  
> EDIT: correct the title and desc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to