Dandandan opened a new issue, #22194:
URL: https://github.com/apache/datafusion/issues/22194
### Describe the bug
`arrow_cast` accepts type strings that are not legal Arrow type
combinations. The cast itself succeeds and the type propagates through the
logical plan, but downstream operations panic in `arrow-array` with `not
implemented: Unexpected data type Time32(µs)` (or similar).
Per the Arrow spec, `Time32` only supports `Second` and `Millisecond`;
`Time64` only supports `Microsecond` and `Nanosecond`. The other four
combinations should be rejected.
### To Reproduce
```rust
use datafusion::prelude::SessionContext;
#[tokio::main]
async fn main() {
let ctx = SessionContext::new();
let _ = ctx
.sql("SELECT arrow_cast(0, 'Time32(Microsecond)') + 1")
.await
.unwrap()
.create_physical_plan()
.await;
}
```
Panic:
```
thread 'main' panicked at .../arrow-array-58.3.0/src/array/mod.rs:986:15:
not implemented: Unexpected data type Time32(µs)
```
All four invalid combinations panic when used with arithmetic:
```sql
SELECT arrow_cast(0, 'Time32(Microsecond)') + 1
SELECT arrow_cast(0, 'Time32(Nanosecond)') + 1
SELECT arrow_cast(0, 'Time64(Second)') + 1
SELECT arrow_cast(0, 'Time64(Millisecond)') + 1
```
The original fuzzer find was:
```sql
SELECT arrow_cast('5:00', 'Time32(Second)') - arrow_cast('03:00',
'Time32(Microsecond)')
```
### Expected behavior
`arrow_cast` should reject the four invalid `Time(Unit)` combinations at
planning time with a `plan_err!` such as:
> Invalid Arrow type combination: `Time32` only supports `Second` and
`Millisecond`. Use `Time64(Microsecond)` for sub-millisecond precision.
The public SQL API should never panic on user-supplied SQL, even with
obviously-malformed type strings.
### Root cause
`arrow_cast` constructs a `DataType::Time32(TimeUnit::Microsecond)` (or
similar invalid combo) from the user-supplied string without validating against
Arrow's type-system rules. Downstream `arrow-array` code paths assume the type
is well-formed and panic via `unimplemented!()` when they see the illegal
combination.
Two-sided fix:
- **DataFusion**: validate `Time32`/`Time64` × `TimeUnit` combinations when
parsing the target type in `arrow_cast`.
- **arrow-rs** (separate): even if a malformed type reaches array code, it
should return `DataFusionError`/`ArrowError` rather than `unimplemented!()`.
### Additional context
Found by a `cargo fuzz` target (`fuzz/fuzz_targets/sql_physical_plan.rs`)
seeded with SQL from `datafusion/sqllogictest/test_files/`. The fuzzer mutated
an existing `arrow_cast(..., 'Time32(Second)')` example by changing `Second` →
`Microsecond`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]