zhzy0077 opened a new issue, #5810:
URL: https://github.com/apache/arrow-datafusion/issues/5810
### Describe the bug
When running statistics, selectivity is calculated by the distance between
selected range and total range. See analyze_expr_scalar_comparison.
While calculating the distance, scalar::distance is simply subtracting MIN
from MAX and it's possible it panicked or returns a negative number when it
overflows.
### To Reproduce
Create a parquet file with max value i64::MAX and min value i64::MIN.
Run a query like:
```rust
let ctx = SessionContext::new();
let df = ctx.read_parquet("<file>.parquet",
ParquetReadOptions::default()).await?;
let df = df
.filter(col("value").lt(lit(0 as i64)))?
.aggregate(vec![], vec![max(col("value"))])?;
df.show().await?;
```
### Expected behavior
It shows the result.
### Additional context
It panicked when running in debug mode.
```
thread 'main' panicked at 'attempt to subtract with overflow',
~/repo/arrow-datafusion/datafusion/common/src/scalar.rs:1816:9
stack backtrace:
0: rust_begin_unwind
at
/rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/std/src/panicking.rs:575:5
1: core::panicking::panic_fmt
at
/rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/panicking.rs:64:14
2: core::panicking::panic
at
/rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/panicking.rs:114:5
3: datafusion_common::scalar::ScalarValue::sub
at
~/repo/arrow-datafusion/datafusion/common/src/scalar.rs:1816:9
4: datafusion_common::scalar::ScalarValue::distance
at
~/repo/arrow-datafusion/datafusion/common/src/scalar.rs:1885:13
5:
datafusion_physical_expr::expressions::binary::analyze_expr_scalar_comparison
at
~/repo/arrow-datafusion/datafusion/physical-expr/src/expressions/binary.rs:861:57
6: <datafusion_physical_expr::expressions::binary::BinaryExpr as
datafusion_physical_expr::physical_expr::PhysicalExpr>::analyze
at
~/repo/arrow-datafusion/datafusion/physical-expr/src/expressions/binary.rs:732:25
7: <datafusion::physical_plan::filter::FilterExec as
datafusion::physical_plan::ExecutionPlan>::statistics
at
~/repo/arrow-datafusion/datafusion/core/src/physical_plan/filter.rs:183:28
8: datafusion::physical_optimizer::aggregate_statistics::take_optimizable
at
~/repo/arrow-datafusion/datafusion/core/src/physical_optimizer/aggregate_statistics.rs:127:37
9:
<datafusion::physical_optimizer::aggregate_statistics::AggregateStatistics as
datafusion::physical_optimizer::optimizer::PhysicalOptimizerRule>::optimize
at
~/repo/arrow-datafusion/datafusion/core/src/physical_optimizer/aggregate_statistics.rs:56:41
10:
datafusion::physical_plan::planner::DefaultPhysicalPlanner::optimize_internal
at
~/repo/arrow-datafusion/datafusion/core/src/physical_plan/planner.rs:1794:24
11: <datafusion::physical_plan::planner::DefaultPhysicalPlanner as
datafusion::physical_plan::planner::PhysicalPlanner>::create_physical_plan::{{closure}}
at
~/repo/arrow-datafusion/datafusion/core/src/physical_plan/planner.rs:427:17
12: <core::pin::Pin<P> as core::future::future::Future>::poll
at
/rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/future/future.rs:125:9
13: <datafusion::execution::context::DefaultQueryPlanner as
datafusion::execution::context::QueryPlanner>::create_physical_plan::{{closure}}
at
~/repo/arrow-datafusion/datafusion/core/src/execution/context.rs:1175:13
14: <core::pin::Pin<P> as core::future::future::Future>::poll
at
/rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/future/future.rs:125:9
15:
datafusion::execution::context::SessionState::create_physical_plan::{{closure}}
at
~/repo/arrow-datafusion/datafusion/core/src/execution/context.rs:1670:13
16: datafusion::dataframe::DataFrame::create_physical_plan::{{closure}}
at
~/repo/arrow-datafusion/datafusion/core/src/dataframe.rs:99:60
17: datafusion::dataframe::DataFrame::collect::{{closure}}
at
~/repo/arrow-datafusion/datafusion/core/src/dataframe.rs:663:47
18: datafusion::dataframe::DataFrame::show::{{closure}}
at
~/repo/arrow-datafusion/datafusion/core/src/dataframe.rs:681:37
19: rust_sample::main::{{closure}}
at ./src/main.rs:33:14
20: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
at
~/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/park.rs:283:63
21: tokio::runtime::coop::with_budget
at
~/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/coop.rs:107:5
22: tokio::runtime::coop::budget
at
~/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/coop.rs:73:5
23: tokio::runtime::park::CachedParkThread::block_on
at
~/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/park.rs:283:31
24: tokio::runtime::context::BlockingRegionGuard::block_on
at
~/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/context.rs:315:13
25: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
at
~/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/scheduler/multi_thread/mod.rs:66:9
26: tokio::runtime::runtime::Runtime::block_on
at
~/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/runtime.rs:304:45
27: rust_sample::main
at ./src/main.rs:69:5
28: core::ops::function::FnOnce::call_once
at
/rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/ops/function.rs:250:5
```
While it doesn't panic in release mode but it's subject to pick sub-optimal
plans because of the wrong stats.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]