nicklan opened a new issue, #7712:
URL: https://github.com/apache/arrow-rs/issues/7712

   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   
   In arrow_json, `Decoder::decode` can panic if it encounters two high 
surrogates in a row. Since this method returns a `Result`, panics are not 
expected, even in error cases.
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   
   The following program reproduces the bug:
   ```rust
   use std::io::{BufRead, BufReader};
   use std::sync::Arc;
   
   use arrow::datatypes::{DataType, Field};
   use arrow_json::ReaderBuilder;
   
   fn main() {
       let mut decoder =
           ReaderBuilder::new_with_field(Arc::new(Field::new("test", 
DataType::Utf8, true)))
               .build_decoder()
               .unwrap();
       let s = r#"{"test": "\uD800\uD801"}"#;
       let mut reader = BufReader::new(s.as_bytes());
       let buf = reader.fill_buf().unwrap();
       let _ = decoder.decode(buf);
   }
   ```
   
   Running this gives:
   ```
   thread 'main' panicked at 
/home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-json-55.1.0/src/reader/tape.rs:708:49:
   attempt to subtract with overflow
   stack backtrace:
      0: rust_begin_unwind
                at 
/rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/std/src/panicking.rs:695:5
      1: core::panicking::panic_fmt
                at 
/rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/panicking.rs:75:14
      2: core::panicking::panic_const::panic_const_sub_overflow
                at 
/rustc/05f9846f893b09a1be1fc8560e33fc3c815cfecb/library/core/src/panicking.rs:178:21
      3: arrow_json::reader::tape::char_from_surrogate_pair
                at [..]/arrow-json-55.1.0/src/reader/tape.rs:708:49
      4: arrow_json::reader::tape::TapeDecoder::decode
                at [..]/arrow-json-55.1.0/src/reader/tape.rs:514:37
      5: arrow_json::reader::Decoder::decode
                at [..]/arrow-json-55.1.0/src/reader/mod.rs:439:9
      6: arrow_panic::main
                at ./src/main.rs:15:13
      7: core::ops::function::FnOnce::call_once
                at 
[..]/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
   ```
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   
   `decode` should return an error as the string is invalid, but it should not 
panic.
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to