EeshanBembi opened a new pull request, #17449: URL: https://github.com/apache/datafusion/pull/17449
## Summary This PR fixes a panic in `UnionExec` when constructed with empty inputs, replacing the crash with proper error handling and descriptive error messages. **Fixes:** #17052 ## Problem When `UnionExec::new(vec![])` was called with an empty input vector, it would panic with: ``` thread '...' panicked at datafusion/physical-plan/src/union.rs:542:24: index out of bounds: the len is 0 but the index is 0 ``` This occurred because `union_schema()` directly accessed `inputs[0]` without checking if the array was empty. ## Solution ### Core Changes 1. **Made `UnionExec::new()` return `Result<Self>`**: - Added validation: returns error if `inputs.is_empty()` - Provides clear error message: `"UnionExec requires at least one input"` 2. **Made `union_schema()` return `Result<SchemaRef>`**: - Added empty input validation before accessing `inputs[0]` - Returns descriptive error: `"Cannot create union schema from empty inputs"` 3. **Updated all call sites** (7 files): - `physical_planner.rs` - Core DataFusion integration - `repartition/mod.rs` - Internal dependencies - 4 test files - Updated to handle `Result` return type ### Error Handling - **Before**: Index out of bounds panic (unhelpful) - **After**: Clear error messages that guide users ```rust // Before: panic! let union = UnionExec::new(vec![]); // PANIC! // After: proper error handling match UnionExec::new(vec![]) { Ok(_) => { /* use union */ } Err(e) => println!("Error: {}", e); // "UnionExec requires at least one input" } ``` ## Testing Added 4 comprehensive tests: 1. **`test_union_empty_inputs()`** - Verifies empty input validation 2. **`test_union_schema_empty_inputs()`** - Tests schema creation with empty inputs 3. **`test_union_single_input()`** - Ensures single input still works 4. **`test_union_multiple_inputs_still_works()`** - Verifies existing functionality unchanged **Test Results:** - ✅ All new tests pass - ✅ All existing union tests pass (8/8) - ✅ All physical planner integration tests pass ## Backward Compatibility - **Existing functionality unchanged** for valid inputs (≥1 input) - **Only adds error handling** for previously crashing invalid inputs - **API change**: `UnionExec::new()` now returns `Result<Self>` instead of `Self` This is a **breaking change** but justified because: 1. The previous behavior (panic) was incorrect 2. Empty inputs are invalid by design (no logical meaning) 3. Consistent with logical `Union` which requires ≥2 inputs 4. Better error handling improves user experience ## Files Changed - `datafusion/physical-plan/src/union.rs` - Core fix + tests (main changes) - `datafusion/core/src/physical_planner.rs` - Handle `Result` return - `datafusion/physical-plan/src/repartition/mod.rs` - Update internal calls - 4 test files - Update test utilities and test cases The fix provides robust error handling while maintaining all existing functionality for valid use cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org