andrei-ionescu opened a new issue #982: URL: https://github.com/apache/arrow-rs/issues/982
**Describe the bug** Reading Parquet file with timestamp column containing a future date like `9999-12-31 02:00:00` year results in overflow panic with the following output: ``` thread 'tokio-runtime-worker' panicked at 'attempt to multiply with overflow' ``` **To Reproduce** Steps to reproduce the behavior: 1. Download the attached zip file that contains the parquet file: [data-dimension-vehicle-20210609T222533Z-4cols-14rows.parquet.zip](https://github.com/apache/arrow-datafusion/files/7601988/data-dimension-vehicle-20210609T222533Z-4cols-14rows.parquet.zip) 2. Unzip it and it should give you the `data-dimension-vehicle-20210609T222533Z-4cols-14rows.parquet` file. 3. Create a new project with `cargo new read-parquet`, create a `data` folder in your project and put the parquet file in the `data` folder inside your project. 4. Modify the `Cargo.toml` file to contain the following: ```toml [package] name = "read-parquet" version = "0.1.0" edition = "2021" [dependencies] tokio = "1.14" arrow = "6.0" datafusion = "6.0" ``` 4. Put the following code in `main.rs` to read the given parquet file: ```rust use datafusion::prelude::*; #[tokio::main] async fn main() -> datafusion::error::Result<()> { let mut ctx = ExecutionContext::new(); /* * Parquet file schema: * * message spark_schema { * optional binary licence_code (UTF8); * optional binary vehicle_make (UTF8); * optional binary fuel_type (UTF8); * optional int96 dimension_load_date; * } */ ctx .register_parquet("vehicles", "./data/data-dimension-vehicle-20210609T222533Z-4cols-14rows.parquet") .await?; let df = ctx .sql(" SELECT licence_code, vehicle_make, fuel_type, CAST(dimension_load_date as TIMESTAMP) as dms FROM vehicles LiMIT 10 ") .await?; df .show() .await?; Ok(()) } ``` 5. Execute `cargo run`. 6. Result: ``` thread 'tokio-runtime-worker' panicked at 'attempt to multiply with overflow', /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:179:46 stack backtrace: 0: rust_begin_unwind at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:498:5 1: core::panicking::panic_fmt at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panicking.rs:107:14 2: core::panicking::panic at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panicking.rs:48:5 3: <parquet::arrow::converter::Int96ArrayConverter as parquet::arrow::converter::Converter<alloc::vec::Vec<core::option::Option<parquet::data_type::Int96>>,arrow::array::array_primitive::PrimitiveArray<arrow::datatypes::types::TimestampNanosecondType>>>::convert::{{closure}}::{{closure}} at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:179:46 4: core::option::Option<T>::map at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/option.rs:846:29 5: <parquet::arrow::converter::Int96ArrayConverter as parquet::arrow::converter::Converter<alloc::vec::Vec<core::option::Option<parquet::data_type::Int96>>,arrow::array::array_primitive::PrimitiveArray<arrow::datatypes::types::TimestampNanosecondType>>>::convert::{{closure}} at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:179:30 6: core::iter::adapters::map::map_fold::{{closure}} at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:84:28 7: core::iter::traits::iterator::Iterator::fold at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:2171:21 8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:124:9 9: core::iter::traits::iterator::Iterator::for_each at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:737:9 10: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/spec_extend.rs:40:17 11: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/spec_from_iter_nested.rs:56:9 12: alloc::vec::source_iter_marker::<impl alloc::vec::spec_from_iter::SpecFromIter<T,I> for alloc::vec::Vec<T>>::from_iter at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/source_iter_marker.rs:31:20 13: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/mod.rs:2549:9 14: core::iter::traits::iterator::Iterator::collect at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:1745:9 15: <parquet::arrow::converter::Int96ArrayConverter as parquet::arrow::converter::Converter<alloc::vec::Vec<core::option::Option<parquet::data_type::Int96>>,arrow::array::array_primitive::PrimitiveArray<arrow::datatypes::types::TimestampNanosecondType>>>::convert at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:177:13 16: <parquet::arrow::converter::ArrayRefConverter<S,A,C> as parquet::arrow::converter::Converter<S,alloc::sync::Arc<dyn arrow::array::array::Array>>>::convert at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:450:9 17: <parquet::arrow::array_reader::ComplexObjectArrayReader<T,C> as parquet::arrow::array_reader::ArrayReader>::next_batch at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/array_reader.rs:545:25 18: <parquet::arrow::array_reader::StructArrayReader as parquet::arrow::array_reader::ArrayReader>::next_batch::{{closure}} at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/array_reader.rs:1130:27 19: core::iter::adapters::map::map_try_fold::{{closure}} at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:91:28 20: core::iter::traits::iterator::Iterator::try_fold at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:1995:21 21: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:117:9 22: <parquet::arrow::array_reader::StructArrayReader as parquet::arrow::array_reader::ArrayReader>::next_batch at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/array_reader.rs:1127:30 23: <parquet::arrow::arrow_reader::ParquetRecordBatchReader as core::iter::traits::iterator::Iterator>::next at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/arrow_reader.rs:175:15 24: datafusion::physical_plan::file_format::parquet::read_partition at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/file_format/parquet.rs:424:19 25: <datafusion::physical_plan::file_format::parquet::ParquetExec as datafusion::physical_plan::ExecutionPlan>::execute::{{closure}}::{{closure}} at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/file_format/parquet.rs:213:29 26: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/task.rs:42:21 27: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}} at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/core.rs:161:17 28: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/loom/std/unsafe_cell.rs:14:9 29: tokio::runtime::task::core::CoreStage<T>::poll at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/core.rs:151:13 30: tokio::runtime::task::harness::poll_future::{{closure}} at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:461:19 31: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panic/unwind_safe.rs:271:9 32: std::panicking::try::do_call at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:406:40 33: <unknown> at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/distinct_expressions.rs:127:15 34: std::panicking::try at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:370:19 35: std::panic::catch_unwind at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panic.rs:133:14 36: tokio::runtime::task::harness::poll_future at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:449:18 37: tokio::runtime::task::harness::Harness<T,S>::poll_inner at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:98:27 38: tokio::runtime::task::harness::Harness<T,S>::poll at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:53:15 39: tokio::runtime::task::raw::poll at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/raw.rs:113:5 40: tokio::runtime::task::raw::RawTask::poll at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/raw.rs:70:18 41: tokio::runtime::task::UnownedTask<S>::run at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/mod.rs:379:9 42: tokio::runtime::blocking::pool::Inner::run at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/pool.rs:264:17 43: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}} at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/pool.rs:244:17 ``` **Expected behavior** To be able to read that parquet file. **The parquet file can be read with `parquet-tools` CLI and Apache Spark.** **Additional context** The root cause is the fact that the parquet file contains some rows with `9999-12-31 02:00:00` in the `dimension_load_date` column. **This future date is supported by Parquet and Spark**. The content of the parquet file is: ``` +------------+------------------+------------------+-------------------+ |licence_code|vehicle_make |fuel_type |dimension_load_date| +------------+------------------+------------------+-------------------+ |odc-odbl |**Not Provided** |**Not Provided** |9999-12-31 02:00:00| |odc-odbl |**Not Applicable**|**Not Applicable**|9998-12-31 02:00:00| |odc-odbl |SAVIEM |Petrol |2021-06-09 03:02:37| |odc-odbl |YAMAHA |Petrol |2021-06-09 03:43:47| |odc-odbl |VAUXHALL |Petrol |2020-10-18 03:23:47| |odc-odbl |VAUXHALL |Petrol |2021-06-09 03:02:37| |odc-odbl |BMW |Petrol |2021-06-09 03:38:39| |odc-odbl |MG |Petrol |2020-10-18 03:23:47| |odc-odbl |PEUGEOT |Diesel |2020-10-18 03:35:16| |odc-odbl |FORD |Diesel |2020-10-18 03:23:47| |odc-odbl |FORD |Petrol |2020-10-18 03:12:55| |odc-odbl |SKODA |Diesel |2021-06-09 03:02:37| |odc-odbl |SHOGUN |Diesel |2020-10-18 03:12:55| |odc-odbl |MITSUBISHI |Diesel |2021-06-10 01:15:47| +------------+------------------+------------------+-------------------+ ``` To find out more about how the root cause was detected you can follow apache/arrow-datafusion#1359. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
