andrei-ionescu opened a new issue #1359:
URL: https://github.com/apache/arrow-datafusion/issues/1359
**Describe the bug**
Reading Parquet file with `int96` results in panic with the following error:
```
thread 'tokio-runtime-worker' panicked at 'attempt to multiply with
overflow',
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:179:46
```
**To Reproduce**
Steps to reproduce the behavior:
1. Download the attached zip file that contains the parquet file:
[data-dimension-vehicle-20210609T222533Z-4cols-14rows.parquet.zip](https://github.com/apache/arrow-datafusion/files/7601988/data-dimension-vehicle-20210609T222533Z-4cols-14rows.parquet.zip)
2. Unzip it and it should give you the
`data-dimension-vehicle-20210609T222533Z-4cols-14rows.parquet` file.
3. Create a new project with `cargo new read-parquet`, create a `data`
folder in your project and put the parquet file in the `data` folder inside
your project.
4. Modify the `Cargo.toml` file to contain the following:
```toml
[package]
name = "read-parquet"
version = "0.1.0"
edition = "2021"
[dependencies]
tokio = "1.14"
arrow = "6.0"
datafusion = "6.0"
```
4. Put the following code in `main.rs` to read the given parquet file:
```rust
use datafusion::prelude::*;
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
let mut ctx = ExecutionContext::new();
/*
* Parquet file schema:
*
* message spark_schema {
* optional binary licence_code (UTF8);
* optional binary vehicle_make (UTF8);
* optional binary fuel_type (UTF8);
* optional int96 dimension_load_date;
* }
*/
ctx
.register_parquet("vehicles",
"./data/data-dimension-vehicle-20210609T222533Z-4cols-14rows.parquet")
.await?;
let df = ctx
.sql("
SELECT
licence_code,
vehicle_make,
fuel_type,
CAST(dimension_load_date as TIMESTAMP) as dms
FROM vehicles
LiMIT 10
")
.await?;
df
.show()
.await?;
Ok(())
}
```
5. Execute `cargo run`.
6. Result:
```
thread 'tokio-runtime-worker' panicked at 'attempt to multiply with
overflow',
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:179:46
stack backtrace:
0: rust_begin_unwind
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:498:5
1: core::panicking::panic_fmt
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panicking.rs:107:14
2: core::panicking::panic
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panicking.rs:48:5
3: <parquet::arrow::converter::Int96ArrayConverter as
parquet::arrow::converter::Converter<alloc::vec::Vec<core::option::Option<parquet::data_type::Int96>>,arrow::array::array_primitive::PrimitiveArray<arrow::datatypes::types::TimestampNanosecondType>>>::convert::{{closure}}::{{closure}}
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:179:46
4: core::option::Option<T>::map
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/option.rs:846:29
5: <parquet::arrow::converter::Int96ArrayConverter as
parquet::arrow::converter::Converter<alloc::vec::Vec<core::option::Option<parquet::data_type::Int96>>,arrow::array::array_primitive::PrimitiveArray<arrow::datatypes::types::TimestampNanosecondType>>>::convert::{{closure}}
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:179:30
6: core::iter::adapters::map::map_fold::{{closure}}
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:84:28
7: core::iter::traits::iterator::Iterator::fold
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:2171:21
8: <core::iter::adapters::map::Map<I,F> as
core::iter::traits::iterator::Iterator>::fold
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:124:9
9: core::iter::traits::iterator::Iterator::for_each
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:737:9
10: <alloc::vec::Vec<T,A> as
alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/spec_extend.rs:40:17
11: <alloc::vec::Vec<T> as
alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/spec_from_iter_nested.rs:56:9
12: alloc::vec::source_iter_marker::<impl
alloc::vec::spec_from_iter::SpecFromIter<T,I> for alloc::vec::Vec<T>>::from_iter
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/source_iter_marker.rs:31:20
13: <alloc::vec::Vec<T> as
core::iter::traits::collect::FromIterator<T>>::from_iter
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/mod.rs:2549:9
14: core::iter::traits::iterator::Iterator::collect
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:1745:9
15: <parquet::arrow::converter::Int96ArrayConverter as
parquet::arrow::converter::Converter<alloc::vec::Vec<core::option::Option<parquet::data_type::Int96>>,arrow::array::array_primitive::PrimitiveArray<arrow::datatypes::types::TimestampNanosecondType>>>::convert
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:177:13
16: <parquet::arrow::converter::ArrayRefConverter<S,A,C> as
parquet::arrow::converter::Converter<S,alloc::sync::Arc<dyn
arrow::array::array::Array>>>::convert
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:450:9
17: <parquet::arrow::array_reader::ComplexObjectArrayReader<T,C> as
parquet::arrow::array_reader::ArrayReader>::next_batch
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/array_reader.rs:545:25
18: <parquet::arrow::array_reader::StructArrayReader as
parquet::arrow::array_reader::ArrayReader>::next_batch::{{closure}}
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/array_reader.rs:1130:27
19: core::iter::adapters::map::map_try_fold::{{closure}}
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:91:28
20: core::iter::traits::iterator::Iterator::try_fold
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:1995:21
21: <core::iter::adapters::map::Map<I,F> as
core::iter::traits::iterator::Iterator>::try_fold
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:117:9
22: <parquet::arrow::array_reader::StructArrayReader as
parquet::arrow::array_reader::ArrayReader>::next_batch
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/array_reader.rs:1127:30
23: <parquet::arrow::arrow_reader::ParquetRecordBatchReader as
core::iter::traits::iterator::Iterator>::next
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/arrow_reader.rs:175:15
24: datafusion::physical_plan::file_format::parquet::read_partition
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/file_format/parquet.rs:424:19
25: <datafusion::physical_plan::file_format::parquet::ParquetExec as
datafusion::physical_plan::ExecutionPlan>::execute::{{closure}}::{{closure}}
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/file_format/parquet.rs:213:29
26: <tokio::runtime::blocking::task::BlockingTask<T> as
core::future::future::Future>::poll
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/task.rs:42:21
27: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/core.rs:161:17
28: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/loom/std/unsafe_cell.rs:14:9
29: tokio::runtime::task::core::CoreStage<T>::poll
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/core.rs:151:13
30: tokio::runtime::task::harness::poll_future::{{closure}}
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:461:19
31: <core::panic::unwind_safe::AssertUnwindSafe<F> as
core::ops::function::FnOnce<()>>::call_once
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panic/unwind_safe.rs:271:9
32: std::panicking::try::do_call
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:406:40
33: <unknown>
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/distinct_expressions.rs:127:15
34: std::panicking::try
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:370:19
35: std::panic::catch_unwind
at
/rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panic.rs:133:14
36: tokio::runtime::task::harness::poll_future
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:449:18
37: tokio::runtime::task::harness::Harness<T,S>::poll_inner
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:98:27
38: tokio::runtime::task::harness::Harness<T,S>::poll
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:53:15
39: tokio::runtime::task::raw::poll
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/raw.rs:113:5
40: tokio::runtime::task::raw::RawTask::poll
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/raw.rs:70:18
41: tokio::runtime::task::UnownedTask<S>::run
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/mod.rs:379:9
42: tokio::runtime::blocking::pool::Inner::run
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/pool.rs:264:17
43: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
at
/Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/pool.rs:244:17
```
I've tried the following combinations but I got the same error:
- using DataFrame api instead of SQL -- _error_ 🔴
- with and without `CAST` to timestamp -- _error_ 🔴
- not selecting the `dimension_load_date` -- _success_ 🟢
**Expected behavior**
To be able to read that parquet file. The parquet file can be read with
`parquet-tools` CLI and Apache Spark.
**Additional context**
OS: `macOS 12.0.1` (Monterey)
Rust: `rustc 1.58.0-nightly (65c55bf93 2021-11-23)`
Cargo: `cargo 1.58.0-nightly (e1fb17631 2021-11-22)`
I transformed the parquet file into CSV and everything worked as expected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]