alamb opened a new issue #614:
URL: https://github.com/apache/arrow-rs/issues/614


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   The MIRI tool helps crates ensure they have no undefined behavior 
https://github.com/rust-lang/miri
   
   To improve the stability / security of the parquet crate, it would be cool 
to run MIRI against that too.
   
   **Describe the solution you'd like**
   
   Ideally, we would run same kind of MIRI checks on the parquet crate as we do 
now on arrow: 
https://github.com/apache/arrow-rs/blob/master/.github/workflows/miri.yaml#L60
   
   This means a command something like
   
   ```shell
   cargo +nightly miri  test  -p parquet
   ```
   
   Sadly, when I run this locally the very first test fails with an alignment 
issue
   
   ```shell
   (arrow_dev) alamb@MacBook-Pro:~/Software/arrow-rs$ cargo +nightly miri  test 
 -p parquet
   WARNING: Ignoring `RUSTC_WRAPPER` environment variable, Miri does not 
support wrapping.
      Compiling arrow v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/arrow)
      Compiling parquet v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/parquet)
       Finished test [unoptimized + debuginfo] target(s) in 4.53s
        Running unittests 
(target/miri/x86_64-apple-darwin/debug/deps/parquet-7056b5943fd0017c)
   
   running 456 tests
   test 
arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels ... 
error: Undefined Behavior: accessing memory with alignment 1, but alignment 4 
is required
       --> parquet/src/util/bit_packing.rs:150:12
        |
   150  |     *out = (*in_buf) % (1u32 << 2);
        |            ^^^^^^^^^ accessing memory with alignment 1, but alignment 
4 is required
        |
        = help: this indicates a bug in the program: it performed an invalid 
operation, and caused Undefined Behavior
        = help: see 
https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html 
for further information
                
        = note: inside `util::bit_packing::unpack2_32` at 
parquet/src/util/bit_packing.rs:150:12
   note: inside `util::bit_packing::unpack32` at 
parquet/src/util/bit_packing.rs:37:14
       --> parquet/src/util/bit_packing.rs:37:14
        |
   37   |         2 => unpack2_32(in_ptr, out_ptr),
        |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   note: inside `util::bit_util::BitReader::get_batch::<i16>` at 
parquet/src/util/bit_util.rs:568:30
       --> parquet/src/util/bit_util.rs:568:30
        |
   568  |                     in_ptr = unpack32(in_ptr, out_ptr, num_bits);
        |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   note: inside `encodings::rle::RleDecoder::get_batch::<i16>` at 
parquet/src/encodings/rle.rs:415:30
       --> parquet/src/encodings/rle.rs:415:30
        |
   415  |                   num_values = bit_reader.get_batch::<T>(
        |  ______________________________^
   416  | |                     &mut buffer[values_read..values_read + 
num_values],
   417  | |                     self.bit_width as usize,
   418  | |                 );
        | |_________________^
   note: inside `encodings::levels::LevelDecoder::get` at 
parquet/src/encodings/levels.rs:262:35
       --> parquet/src/encodings/levels.rs:262:35
        |
   262  |                 let values_read = decoder.get_batch::<i16>(&mut 
buffer[0..len])?;
        |                                   
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   note: inside 
`column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_def_levels` 
at parquet/src/column/reader.rs:459:9
       --> parquet/src/column/reader.rs:459:9
        |
   459  |         level_decoder.get(buffer)
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^
   note: inside 
`column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_batch` at 
parquet/src/column/reader.rs:210:38
       --> parquet/src/column/reader.rs:210:38
        |
   210  |                       num_def_levels = self.read_def_levels(
        |  ______________________________________^
   211  | |                         &mut levels[levels_read..levels_read + 
iter_batch_size],
   212  | |                     )?;
        | |_____________________^
   note: inside 
`<arrow::array_reader::ComplexObjectArrayReader<data_type::ByteArrayType, 
arrow::converter::ArrayRefConverter<std::vec::Vec<std::option::Option<data_type::ByteArray>>,
 arrow::array::GenericStringArray<i32>, arrow::converter::Utf8ArrayConverter>> 
as arrow::array_reader::ArrayReader>::next_batch` at 
parquet/src/arrow/array_reader.rs:490:17
       --> parquet/src/arrow/array_reader.rs:490:17
        |
   490  | /                 self.column_reader.as_mut().unwrap().read_batch(
   491  | |                     num_to_read,
   492  | |                     cur_def_levels_buf,
   493  | |                     cur_rep_levels_buf,
   494  | |                     cur_data_buf,
   495  | |                 )?;
        | |_________________^
   note: inside 
`arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels` at 
parquet/src/arrow/array_reader.rs:2229:21
       --> parquet/src/arrow/array_reader.rs:2229:21
        |
   2229 |         let array = array_reader.next_batch(values_per_page / 
2).unwrap();
        |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   note: inside closure at parquet/src/arrow/array_reader.rs:2154:5
       --> parquet/src/arrow/array_reader.rs:2154:5
        |
   2153 |       #[test]
        |       ------- in this procedural macro expansion
   2154 | /     fn test_complex_array_reader_def_and_rep_levels() {
   2155 | |         // Construct column schema
   2156 | |         let message_type = "
   2157 | |         message test_schema {
   ...    |
   2276 | |         );
   2277 | |     }
        | |_____^
        = note: inside `<[closure@parquet/src/arrow/array_reader.rs:2154:5: 
2277:6] as std::ops::FnOnce<()>>::call_once - shim` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
        = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` 
at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
        = note: inside `test::__rust_begin_short_backtrace::<fn()>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:578:5
        = note: inside closure at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:569:30
        = note: inside `<[closure@test::run_test::{closure#2}] as 
std::ops::FnOnce<()>>::call_once - shim(vtable)` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
        = note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + 
std::marker::Send> as std::ops::FnOnce<()>>::call_once` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1572:9
        = note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn 
std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` 
at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:347:9
        = note: inside 
`std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn
 std::ops::FnOnce() + std::marker::Send>>, ()>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
        = note: inside `std::panicking::r#try::<(), 
std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + 
std::marker::Send>>>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
        = note: inside 
`std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn 
std::ops::FnOnce() + std::marker::Send>>, ()>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
        = note: inside `test::run_test_in_process` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:601:18
        = note: inside closure at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:493:39
        = note: inside `test::run_test::run_test_inner` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:531:13
        = note: inside `test::run_test` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:565:28
        = note: inside 
`test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:306:17
        = note: inside `test::run_tests_console` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/console.rs:290:5
        = note: inside `test::test_main` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:123:15
        = note: inside `test::test_main_static` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:142:5
        = note: inside `main`
        = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` 
at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
        = note: inside 
`std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
        = note: inside closure at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:63:18
        = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> 
for &dyn std::ops::Fn() -> i32 + std::marker::Sync + 
std::panic::RefUnwindSafe>::call_once` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
        = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> 
i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
        = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 
+ std::marker::Sync + std::panic::RefUnwindSafe>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
        = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + 
std::marker::Sync + std::panic::RefUnwindSafe, i32>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
        = note: inside closure at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:48
        = note: inside 
`std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}],
 isize>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
        = note: inside `std::panicking::r#try::<isize, 
[closure@std::rt::lang_start_internal::{closure#2}]>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
        = note: inside 
`std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}],
 isize>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
        = note: inside `std::rt::lang_start_internal` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:20
        = note: inside `std::rt::lang_start::<()>` at 
/Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:62:5
        = note: this error originates in the attribute macro `test` (in Nightly 
builds, run with -Z macro-backtrace for more info)
   
   
   ```
   
   **Describe alternatives you've considered**
   The parquet2 crate in https://github.com/jorgecarleitao/parquet2 from 
@jorgecarleitao apparently already has parquet running under MIR this already 
running, so depending on if that gets integrated into the main arrow-rs repo, 
that might change how we go about getting a clean run
   
   
   
   **Additional context**
   Thanks to help from @roee88 , @jhorstmann  and others who I probably missed, 
we now have MIRI checks validated for the arrow crate on each PR: 
https://github.com/apache/arrow-rs/runs/3154551440
   
   Which runs a command such as:
   ```shell
     cargo miri test -p arrow -- --skip csv --skip ipc --skip json
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to