[
https://issues.apache.org/jira/browse/THRIFT-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235545#comment-17235545
]
Allen George edited comment on THRIFT-5299 at 11/19/20, 3:23 PM:
-----------------------------------------------------------------
II was curious about why this didn't break all the cross tests, so I took a
look at the code again. I noticed that the *only* time the compact protocol
spec uses (varint, non-zigzag) for a signed integer is for:
# collection, message, name sizes
# sequence numbers
It makes no sense for collection sizes and lengths to be negative, and indeed
the spec states that in a number of places. For example:
* {{byte length is the length of the byte array, using var int encoding (must
be >= 0)}} (Binary encoding)
* {{name length is the byte length of the name field, a signed 32 bit integer
encoded as a var int (must be >= 0).}} (Message)
* {{size is the size, a var int (int32), positive values 15 or higher}} (List
and set)
* {{size is the size, a var int (int32), strictly positive values}} (Map)
Indeed, for sizes the rust implementation takes an {{i32}} and immediately
casts it to a {{u32}} before encoding it onto the wire. I also checked
{{integer-rs}}, and as expected unsigned ints are encoded in varint-only form.
So, as far as the spec is concerned, for sizes the compact protocol is doing
the right thing (there's a separate question as to why sizes should *ever* be
negative in the library; I'll look into changing all sizes into {{u32}}, and
what the blast radius of that change should be).
This leaves the sequence number, which is signed and unzigzagged as the only
problem. I also checked, and I think there *may* be another issue with writing
out {{i16}} integers. The protobuf varint spec says that {{i16}} should be
written in no more bits than an {{i32}}; I'll have to check and see if that's
the case.
TL;DR:
* Only the sequence number may be a problem here.
* I'll have to check if writing {{i16}} uses more bytes than necessary.
* Field ids and collection/binary/other sizes should be represented in the rust
lib as {{usize}} or {{u##}} types instead of {{i##}} types.
was (Author: allengeorge):
II was curious about why this didn't break all the cross tests, so I took a
look at the code again. I noticed that the *only* time the compact protocol
spec uses (varint, non-zigzag) for a signed integer is for:
# collection, message, name sizes
# sequence numbers
It makes no sense for collection sizes and lengths to be negative, and indeed
the spec states that in a number of places. For example:
* {{byte length is the length of the byte array, using var int encoding (must
be >= 0)}} (Binary encoding)
* {{name length is the byte length of the name field, a signed 32 bit integer
encoded as a var int (must be >= 0).}} (Message)
* {{size is the size, a var int (int32), positive values 15 or higher}} (List
and set)
* {{size is the size, a var int (int32), strictly positive values}} (Map)
Indeed, for sizes the rust implementation takes an {{i32}} and immediately
casts it to a {{u32}} before encoding it onto the wire. I also checked
{{integer-rs}}, and as expected unsigned ints are encoded in varint-only form.
So, as far as the spec is concerned, for sizes the compact protocol is doing
the right thing (there's a separate question as to why sizes should *ever* be
negative in the library; I'll look into changing all sizes into {{u32}}, and
what the blast radius of that change should be).
This leaves the sequence number, which is signed and unzigzagged as the only
problem. I also checked, and I think there *may* be another issue with writing
out {{i16}} integers. The protobuf varint spec says that {{i16}} should be
written in no more bits than an {{i32}}; I'll have to check and see if that's
the case.
> rs implementation compact protocol seq_id should not use zigzag encoding.
> -------------------------------------------------------------------------
>
> Key: THRIFT-5299
> URL: https://issues.apache.org/jira/browse/THRIFT-5299
> Project: Thrift
> Issue Type: Bug
> Components: Rust - Library
> Affects Versions: 0.13.0
> Reporter: shuo li
> Priority: Major
>
> While reviewing code, I discovered a bug in compact protocol.
> The seq_id in message header is decoded with integer-encoding's i32, which
> uses zigzag decode implicitly.
>
>
> EDIT: correct the title and desc.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)