Dandandan opened a new issue, #22194:
URL: https://github.com/apache/datafusion/issues/22194

   ### Describe the bug
   
   `arrow_cast` accepts type strings that are not legal Arrow type 
combinations. The cast itself succeeds and the type propagates through the 
logical plan, but downstream operations panic in `arrow-array` with `not 
implemented: Unexpected data type Time32(µs)` (or similar).
   
   Per the Arrow spec, `Time32` only supports `Second` and `Millisecond`; 
`Time64` only supports `Microsecond` and `Nanosecond`. The other four 
combinations should be rejected.
   
   ### To Reproduce
   
   ```rust
   use datafusion::prelude::SessionContext;
   
   #[tokio::main]
   async fn main() {
       let ctx = SessionContext::new();
       let _ = ctx
           .sql("SELECT arrow_cast(0, 'Time32(Microsecond)') + 1")
           .await
           .unwrap()
           .create_physical_plan()
           .await;
   }
   ```
   
   Panic:
   
   ```
   thread 'main' panicked at .../arrow-array-58.3.0/src/array/mod.rs:986:15:
   not implemented: Unexpected data type Time32(µs)
   ```
   
   All four invalid combinations panic when used with arithmetic:
   
   ```sql
   SELECT arrow_cast(0, 'Time32(Microsecond)') + 1
   SELECT arrow_cast(0, 'Time32(Nanosecond)')  + 1
   SELECT arrow_cast(0, 'Time64(Second)')      + 1
   SELECT arrow_cast(0, 'Time64(Millisecond)') + 1
   ```
   
   The original fuzzer find was:
   
   ```sql
   SELECT arrow_cast('5:00', 'Time32(Second)') - arrow_cast('03:00', 
'Time32(Microsecond)')
   ```
   
   ### Expected behavior
   
   `arrow_cast` should reject the four invalid `Time(Unit)` combinations at 
planning time with a `plan_err!` such as:
   
   > Invalid Arrow type combination: `Time32` only supports `Second` and 
`Millisecond`. Use `Time64(Microsecond)` for sub-millisecond precision.
   
   The public SQL API should never panic on user-supplied SQL, even with 
obviously-malformed type strings.
   
   ### Root cause
   
   `arrow_cast` constructs a `DataType::Time32(TimeUnit::Microsecond)` (or 
similar invalid combo) from the user-supplied string without validating against 
Arrow's type-system rules. Downstream `arrow-array` code paths assume the type 
is well-formed and panic via `unimplemented!()` when they see the illegal 
combination.
   
   Two-sided fix:
   - **DataFusion**: validate `Time32`/`Time64` × `TimeUnit` combinations when 
parsing the target type in `arrow_cast`.
   - **arrow-rs** (separate): even if a malformed type reaches array code, it 
should return `DataFusionError`/`ArrowError` rather than `unimplemented!()`.
   
   ### Additional context
   
   Found by a `cargo fuzz` target (`fuzz/fuzz_targets/sql_physical_plan.rs`) 
seeded with SQL from `datafusion/sqllogictest/test_files/`. The fuzzer mutated 
an existing `arrow_cast(..., 'Time32(Second)')` example by changing `Second` → 
`Microsecond`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to