[
https://issues.apache.org/jira/browse/THRIFT-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jens Geyer resolved THRIFT-5928.
--------------------------------
Fix Version/s: 0.23.0
Assignee: Arthur Lazzaretti
Resolution: Fixed
> skip() call on unknown binary field fails deserialization instead of graceful
> skipping over field
> -------------------------------------------------------------------------------------------------
>
> Key: THRIFT-5928
> URL: https://issues.apache.org/jira/browse/THRIFT-5928
> Project: Thrift
> Issue Type: Bug
> Components: Rust - Library
> Affects Versions: 0.17.0, 0.21.0
> Reporter: Arthur Lazzaretti
> Assignee: Arthur Lazzaretti
> Priority: Major
> Fix For: 0.23.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> *TInputProtocol::skip_till_depth()* fails when skipping a field that contains
> non-UTF-8 binary data.
> In {*}src/protocol/mod.rs{*}, the *TType::String* branch calls
> {*}read_string(){*}:
> {noformat}
> TType::String => self.read_string().map(|_| ()){noformat}
> *read_string()* internally does *read_bytes()* followed by
> {*}String::from_utf8(){*}, which rejects any byte sequence that is not valid
> UTF-8. Since *string* and *binary* share the same wire type ({*}TType::String
> = 11{*}), *skip()* cannot distinguish between them. When it encounters a
> binary field containing raw bytes. it produces:
> {noformat}
> ProtocolError { kind: InvalidData, message: "invalid utf-8 sequence of 1
> bytes from index 1" }{noformat}
>
> *When the bug triggers:* Any time a newer schema adds a *binary* field to a
> Thrift struct - if an older client tries to deserialize the new field it
> doesn't recognize, it hits this from the *skip()* path in
> *read_from_in_protocol*
>
> *Fix:* Inside {*}skip_till_depth(){*}, replace:
> {noformat}
> TType::String => self.read_string().map(|_| ()){noformat}
> with
> {noformat}
> TType::String => self.read_bytes().map(|_| ()){noformat}
>
> Safe because:
> - The result is discarded immediately ({*}.map(|_| ()){*})
> - *read_string()* is *read_bytes()* + *String::from_utf8()* — both consume
> identical wire bytes
> - No behavioral change for valid UTF-8 string data
> - No effect on parsing of known fields, which use their own typed readers
>
> PR: https://github.com/apache/thrift/pull/3344
--
This message was sent by Atlassian Jira
(v8.20.10#820010)