nicklan opened a new issue, #7712:
URL: https://github.com/apache/arrow-rs/issues/7712
**Describe the bug**
<!--
A clear and concise description of what the bug is.
-->
In arrow_json, `Decoder::decode` can panic if it encounters two high
surrogates in a row. Since this method returns a `Result`, panics are not
expected, even in error cases.
**To Reproduce**
<!--
Steps to reproduce the behavior:
-->
The following program reproduces the bug:
```rust
use std::io::{BufRead, BufReader};
use std::sync::Arc;
use arrow::datatypes::{DataType, Field};
use arrow_json::ReaderBuilder;
fn main() {
let mut decoder =
ReaderBuilder::new_with_field(Arc::new(Field::new("test",
DataType::Utf8, true)))
.build_decoder()
.unwrap();
let s = r#"{"test": "\uD800\uD801"}"#;
let mut reader = BufReader::new(s.as_bytes());
let buf = reader.fill_buf().unwrap();
let _ = decoder.decode(buf);
}
```
Running this gives:
```
thread 'main' panicked at
/home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-json-55.1.0/src/reader/tape.rs:708:49:
attempt to subtract with overflow
stack backtrace:
0: rust_begin_unwind
at
/rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/panicking.rs:695:5
1: core::panicking::panic_fmt
at
/rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/panicking.rs:75:14
2: core::panicking::panic_const::panic_const_sub_overflow
at
/rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/panicking.rs:178:21
3: arrow_json::reader::tape::char_from_surrogate_pair
at [..]/arrow-json-55.1.0/src/reader/tape.rs:708:49
4: arrow_json::reader::tape::TapeDecoder::decode
at [..]/arrow-json-55.1.0/src/reader/tape.rs:514:37
5: arrow_json::reader::Decoder::decode
at [..]/arrow-json-55.1.0/src/reader/mod.rs:439:9
6: arrow_panic::main
at ./src/main.rs:15:13
7: core::ops::function::FnOnce::call_once
at
[..]/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
```
**Expected behavior**
<!--
A clear and concise description of what you expected to happen.
-->
`decode` should return an error as the string is invalid, but it should not
panic.
**Additional context**
<!--
Add any other context about the problem here.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]