aarashy commented on issue #3215: URL: https://github.com/apache/arrow-rs/issues/3215#issuecomment-1331649397
I removed the unwraps here https://github.com/apache/arrow-rs/pull/3232 I have some bytes which reproduce this error, but the data is private. The bytes were the result of the `apache-arrow` NPM package function `tableToIPC` - https://github.com/apache/arrow/blob/5611f2bd0d6136b005d137a84b50709fc5c813bb/js/src/ipc/serialization.ts#L61 # More info about the error stacktrace above Let me share a little bit more about the error in `ArrayData::validate` above. 1. The value of `len_plus_offset` is 2565, and `offset` is 0. Thus, 2565 is the number of rows in the the record batch from higher in the callstack (`read_record_batch`). 2. 2565 is not divisible by 8 (2565/8 = 320.625) - so it seems correct for the `validate` function to be expecting 321 bytes if we need to fit 2565 bits within byte-boundaries (there is a ceiling function being applied on the division, which seems correct). Thus, I really want to understand why the buffer is given only 320 bytes. It seems like instead of correctly zero padding the 2565 bits to reach the next byte-boundary, the bits have truncated at the previous byte-boundary and the byte buffer is 1 byte too small. I'm struggling to really find the origin of the length of the Array data byte buffers here which would be performing this truncation... I suppose it would be `Message::header_as_dictionary_batch` -> `Message::buffers`? # Another similar error stacktrace I have another callstack which is throwing a similar panic on a different input bytes set. This one is from indexing into a slice out of bounds, again off-by-one. ``` thread 'tokio-runtime-worker' panicked at 'range end index 65 out of range for slice of length 64', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/arrow-buffer-28.0.0/src/util/bit_chunk_iterator.rs:57:23 stack backtrace: 0: rust_begin_unwind at ./rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5 1: core::panicking::panic_fmt at ./rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14 2: core::slice::index::slice_end_index_len_fail_rt at ./rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/slice/index.rs:76:5 3: core::slice::index::slice_end_index_len_fail at ./rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/slice/index.rs:69:9 4: arrow_buffer::util::bit_chunk_iterator::UnalignedBitChunk::new 5: arrow_buffer::buffer::immutable::Buffer::count_set_bits_offset 6: arrow_data::data::ArrayData::new_unchecked 7: arrow_data::data::ArrayDataBuilder::build_unchecked 8: arrow_ipc::reader::create_dictionary_array 9: arrow_ipc::reader::create_array 10: arrow_ipc::reader::read_record_batch ``` In this case: 1. The record batch is of length `518`, which is again not divisible by 8 (64.75). 2. The panic is coming from https://github.com/apache/arrow-rs/blob/master/arrow-buffer/src/util/bit_chunk_iterator.rs#L57 - the `UnalignedBitChunk` is being created downstream of the `count_nulls` routine in `ArrayData::new_unchecked`. This callstack doesn't perform validations or return `Result`, so it's awkward for me to add a slice-out-bounds check and return an error. To me this indicates, again, that the wherever the `null_bit_buffer` length is being initialized, it's being truncated incorrectly. 3. The `data_type` of the array being created here is `Dictionary(Int32, Utf8)`. I wonder if @tustvold, @alamb, or @HaoYang670 would know anything about improperly truncated / padded byte buffers during IPC byte reading. It's possible that my input is missing some padding, but perhaps this crate should generously perform padding which the Javascript apache-arrow crate is neglecting. It's also possible that this is a Javascript arrow package bug, but if we have an opportunity to be more permissive and flexible here, we should take it (and we at least shouldn't have panics), -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
