alamb opened a new issue #614: URL: https://github.com/apache/arrow-rs/issues/614
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** The MIRI tool helps crates ensure they have no undefined behavior https://github.com/rust-lang/miri To improve the stability / security of the parquet crate, it would be cool to run MIRI against that too. **Describe the solution you'd like** Ideally, we would run same kind of MIRI checks on the parquet crate as we do now on arrow: https://github.com/apache/arrow-rs/blob/master/.github/workflows/miri.yaml#L60 This means a command something like ```shell cargo +nightly miri test -p parquet ``` Sadly, when I run this locally the very first test fails with an alignment issue ```shell (arrow_dev) alamb@MacBook-Pro:~/Software/arrow-rs$ cargo +nightly miri test -p parquet WARNING: Ignoring `RUSTC_WRAPPER` environment variable, Miri does not support wrapping. Compiling arrow v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/arrow) Compiling parquet v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/parquet) Finished test [unoptimized + debuginfo] target(s) in 4.53s Running unittests (target/miri/x86_64-apple-darwin/debug/deps/parquet-7056b5943fd0017c) running 456 tests test arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels ... error: Undefined Behavior: accessing memory with alignment 1, but alignment 4 is required --> parquet/src/util/bit_packing.rs:150:12 | 150 | *out = (*in_buf) % (1u32 << 2); | ^^^^^^^^^ accessing memory with alignment 1, but alignment 4 is required | = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information = note: inside `util::bit_packing::unpack2_32` at parquet/src/util/bit_packing.rs:150:12 note: inside `util::bit_packing::unpack32` at parquet/src/util/bit_packing.rs:37:14 --> parquet/src/util/bit_packing.rs:37:14 | 37 | 2 => unpack2_32(in_ptr, out_ptr), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `util::bit_util::BitReader::get_batch::<i16>` at parquet/src/util/bit_util.rs:568:30 --> parquet/src/util/bit_util.rs:568:30 | 568 | in_ptr = unpack32(in_ptr, out_ptr, num_bits); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `encodings::rle::RleDecoder::get_batch::<i16>` at parquet/src/encodings/rle.rs:415:30 --> parquet/src/encodings/rle.rs:415:30 | 415 | num_values = bit_reader.get_batch::<T>( | ______________________________^ 416 | | &mut buffer[values_read..values_read + num_values], 417 | | self.bit_width as usize, 418 | | ); | |_________________^ note: inside `encodings::levels::LevelDecoder::get` at parquet/src/encodings/levels.rs:262:35 --> parquet/src/encodings/levels.rs:262:35 | 262 | let values_read = decoder.get_batch::<i16>(&mut buffer[0..len])?; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_def_levels` at parquet/src/column/reader.rs:459:9 --> parquet/src/column/reader.rs:459:9 | 459 | level_decoder.get(buffer) | ^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_batch` at parquet/src/column/reader.rs:210:38 --> parquet/src/column/reader.rs:210:38 | 210 | num_def_levels = self.read_def_levels( | ______________________________________^ 211 | | &mut levels[levels_read..levels_read + iter_batch_size], 212 | | )?; | |_____________________^ note: inside `<arrow::array_reader::ComplexObjectArrayReader<data_type::ByteArrayType, arrow::converter::ArrayRefConverter<std::vec::Vec<std::option::Option<data_type::ByteArray>>, arrow::array::GenericStringArray<i32>, arrow::converter::Utf8ArrayConverter>> as arrow::array_reader::ArrayReader>::next_batch` at parquet/src/arrow/array_reader.rs:490:17 --> parquet/src/arrow/array_reader.rs:490:17 | 490 | / self.column_reader.as_mut().unwrap().read_batch( 491 | | num_to_read, 492 | | cur_def_levels_buf, 493 | | cur_rep_levels_buf, 494 | | cur_data_buf, 495 | | )?; | |_________________^ note: inside `arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels` at parquet/src/arrow/array_reader.rs:2229:21 --> parquet/src/arrow/array_reader.rs:2229:21 | 2229 | let array = array_reader.next_batch(values_per_page / 2).unwrap(); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside closure at parquet/src/arrow/array_reader.rs:2154:5 --> parquet/src/arrow/array_reader.rs:2154:5 | 2153 | #[test] | ------- in this procedural macro expansion 2154 | / fn test_complex_array_reader_def_and_rep_levels() { 2155 | | // Construct column schema 2156 | | let message_type = " 2157 | | message test_schema { ... | 2276 | | ); 2277 | | } | |_____^ = note: inside `<[closure@parquet/src/arrow/array_reader.rs:2154:5: 2277:6] as std::ops::FnOnce<()>>::call_once - shim` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5 = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5 = note: inside `test::__rust_begin_short_backtrace::<fn()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:578:5 = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:569:30 = note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5 = note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1572:9 = note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:347:9 = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40 = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19 = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14 = note: inside `test::run_test_in_process` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:601:18 = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:493:39 = note: inside `test::run_test::run_test_inner` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:531:13 = note: inside `test::run_test` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:565:28 = note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:306:17 = note: inside `test::run_tests_console` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/console.rs:290:5 = note: inside `test::test_main` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:123:15 = note: inside `test::test_main_static` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:142:5 = note: inside `main` = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5 = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18 = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:63:18 = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13 = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40 = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19 = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14 = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:48 = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40 = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19 = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14 = note: inside `std::rt::lang_start_internal` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:20 = note: inside `std::rt::lang_start::<()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:62:5 = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info) ``` **Describe alternatives you've considered** The parquet2 crate in https://github.com/jorgecarleitao/parquet2 from @jorgecarleitao apparently already has parquet running under MIR this already running, so depending on if that gets integrated into the main arrow-rs repo, that might change how we go about getting a clean run **Additional context** Thanks to help from @roee88 , @jhorstmann and others who I probably missed, we now have MIRI checks validated for the arrow crate on each PR: https://github.com/apache/arrow-rs/runs/3154551440 Which runs a command such as: ```shell cargo miri test -p arrow -- --skip csv --skip ipc --skip json ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
