Dandandan opened a new issue, #22188:
URL: https://github.com/apache/datafusion/issues/22188

   ### Describe the bug
   
   `generate_series` and `range` panic with `capacity overflow` when given an 
integer range so large the count exceeds `isize::MAX` bytes. The panic comes 
from `Vec::reserve` inside the integer-range implementation, hit during 
planning (constant folding of the table-valued function).
   
   ### To Reproduce
   
   ```rust
   use datafusion::prelude::SessionContext;
   
   #[tokio::main]
   async fn main() {
       let ctx = SessionContext::new();
       let _ = ctx
           .sql("SELECT generate_series(0, 9223372036854775807)")
           .await
           .unwrap()
           .create_physical_plan()
           .await;
   }
   ```
   
   Panic:
   
   ```
   thread 'main' panicked at .../alloc/src/raw_vec/mod.rs:28:5:
   capacity overflow
   ```
   
   Also reproduces with:
   - `SELECT range(0, 9223372036854775807)`
   - `SELECT range(9223372036854775807)`
   - `SELECT generate_series(-9223372036854775808, 9223372036854775807)`
   
   Bounded ranges like `SELECT generate_series(1, 100)` are fine.
   
   ### Expected behavior
   
   Return a planning/execution error along the lines of "range too large to 
materialize" (or, ideally, a streaming implementation that does not need to 
materialize the full sequence eagerly). The public SQL API should never panic 
on user-supplied SQL.
   
   ### Root cause
   
   
[`datafusion/functions-nested/src/range.rs`](https://github.com/apache/datafusion/blob/main/datafusion/functions-nested/src/range.rs),
 in `generate_range_values`:
   
   ```rust
   // line 563-565   (step > 0 branch)
   let count =
       (start.abs_diff(limit) / step.unsigned_abs()).saturating_add(1) as usize;
   values.reserve(count);                                  // ← panics here
   
   // line 583-585   (step < 0 branch — identical pattern)
   let count =
       (start.abs_diff(limit) / step.unsigned_abs()).saturating_add(1) as usize;
   values.reserve(count);
   ```
   
   For `generate_series(0, i64::MAX, 1)` the `count` is ~`u64::MAX/8` (after 
`saturating_add(1)`), which on a 64-bit target turns into a `usize` of ~`9.2 × 
10^18`. `Vec::<i64>::reserve` multiplies by `size_of::<i64>() = 8`, sees that 
exceeds `isize::MAX`, and panics.
   
   ### Suggested fix
   
   Bound `count` at allocation time:
   
   ```rust
   const MAX_RANGE_ELEMENTS: usize = isize::MAX as usize / 
std::mem::size_of::<i64>();
   if count > MAX_RANGE_ELEMENTS {
       return exec_err!(
           "range too large: would produce {count} elements (max 
{MAX_RANGE_ELEMENTS})"
       );
   }
   values.reserve(count);
   ```
   
   A friendlier limit (say, 1 GiB / 8 B = 128 M elements, configurable) would 
also stop this from being a memory-exhaustion DoS.
   
   ### Additional context
   
   Found by a `cargo fuzz` target (`fuzz/fuzz_targets/sql_physical_plan.rs`) 
seeded with SQL extracted from `datafusion/sqllogictest/test_files/`. The 
fuzzer triggered it from a mutated `generate_series` example by replacing a 
small numeric literal with `9223372036854775807` (`i64::MAX`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to