Dandandan opened a new issue, #22184:
URL: https://github.com/apache/datafusion/issues/22184

   ### Describe the bug
   
   The SQL frontend panics on any compound identifier of 6 or more parts that 
doesn't match a registered column. Found by a libFuzzer target running over a 
seed corpus extracted from `datafusion/sqllogictest/test_files/`.
   
   ### To Reproduce
   
   ```rust
   use datafusion::prelude::SessionContext;
   
   #[tokio::main]
   async fn main() {
       let ctx = SessionContext::new();
       let _ = ctx.sql("SELECT zzz1.zzz2.zzz3.zzz4.zzz5.zzz6").await;
   }
   ```
   
   Panic:
   
   ```
   thread 'main' panicked at datafusion/sql/src/expr/identifier.rs:219:65:
   called `Result::unwrap()` on an `Err` value: Internal("Incorrect number of 
identifiers: 6")
   ```
   
   Triggers for **any** dotted identifier with ≥6 parts where no prefix matches 
a registered column. 5-part identifiers are correctly handled via a 
`not_impl_err!` early-return; 6+ parts fall through.
   
   ### Expected behavior
   
   Return a `not_impl_err!` (same as the 5-part case) or any other proper 
`Err`, not a panic. Public planner API should never panic on user-supplied SQL.
   
   ### Additional context
   
   Root cause is in 
[`datafusion/sql/src/expr/identifier.rs`](https://github.com/apache/datafusion/blob/main/datafusion/sql/src/expr/identifier.rs):
   
   ```rust
   // line 186
   if ids.len() == 5 {                                      // ← only catches 
len == 5
       not_impl_err!("compound identifier: {ids:?}")
   } else {
       // ... loop over outer schemas, may not return ...
   
       // line 217-219
       // Safe unwrap as column name can never be empty or exceed the bounds
       let (relation, column_name) =
           form_identifier(&ids[0..ids.len()]).unwrap();    // ← panics for len 
>= 5
       ...
   }
   ```
   
   `form_identifier` (line 282) returns `internal_err!("Incorrect number of 
identifiers: {}", idents.len())` for any `len > 4`, so the `.unwrap()` is 
unsound for `len >= 6` and the comment above it ("can never be empty or exceed 
the bounds") is incorrect.
   
   Minimal fixes (either works):
   - Change line 186 from `if ids.len() == 5` to `if ids.len() >= 5`, **or**
   - Change `.unwrap()` on line 219 to `?` and remove the misleading comment.
   
   ### Discovery
   
   Found by a `cargo fuzz` target (`fuzz/fuzz_targets/sql_physical_plan.rs`) 
seeded with SQL statements extracted from 
`datafusion/sqllogictest/test_files/`. The fuzzer triggered it within the first 
INITED pass once seeded — a `CopyPart` mutation chained nested-identifier seeds 
together.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to