Arthur Lazzaretti created THRIFT-5928:
-----------------------------------------
Summary: skip() call on unknown binary field fails deserialization
instead of graceful skipping over field
Key: THRIFT-5928
URL: https://issues.apache.org/jira/browse/THRIFT-5928
Project: Thrift
Issue Type: Bug
Components: Rust - Library
Affects Versions: 0.21.0, 0.17.0
Reporter: Arthur Lazzaretti
*TInputProtocol::skip_till_depth()* fails when skipping a field that contains
non-UTF-8 binary data.
In *src/protocol/mod.rs*, the *TType::String* branch calls *read_string()*:
{noformat}
TType::String => self.read_string().map(|_| ()),{noformat}
*read_string()* internally does *read_bytes()* followed by
*String::from_utf8()*, which rejects any byte sequence that is not valid UTF-8.
Since *string* and *binary* share the same wire type (*TType::String = 11*),
*skip()* cannot distinguish between them. When it encounters a binary field
containing raw bytes. it produces:
{noformat}
ProtocolError { kind: InvalidData, message: "invalid utf-8 sequence of 1 bytes
from index 1" }{noformat}
***When this triggers:*** Any time a newer schema adds a *binary* field to a
Thrift struct - if an older client tries to deserialize the new field it
doesn't recognize, it hits this from the *skip()* path in
*read_from_in_protocol*
***Fix:*** Inside {*}skip_till_depth(){*}, replace:
{noformat}
TType::String => self.read_string().map(|_| ()){noformat}
with
{noformat}
TType::String => self.read_bytes().map(|_| ()){noformat}
This is safe because:
- The result is discarded immediately (*.map(|_| ())*)
- *read_string()* is *read_bytes()* + *String::from_utf8()* — both consume
identical wire bytes
- No behavioral change for valid UTF-8 string data
- No effect on parsing of known fields, which use their own typed readers
--
This message was sent by Atlassian Jira
(v8.20.10#820010)