Jefffrey opened a new issue, #19458:
URL: https://github.com/apache/datafusion/issues/19458
This issue servers to document some existing behaviour, as well as discuss
if we can change this to something more intuitive/ergonomic.
# Problem
When using `TypeSignature::Coercible`, there is some interesting behaviour
when we have `DataType::Null`, `DataType::Dictionary` and
`DataType::RunEndEncoded` passed in.
Given a function with signature like so:
```rust
Signature::coercible(
vec![Coercion::new_exact(TypeSignatureClass::Native(logical_int64())],
Volatility::Immutable
)
```
It expects only a single `Int64` argument. However, if we pass in
`DataType::Null` or `Dictionary(Int8, Int64)` argument, these are actually
valid and in-fact are casted to `Int64` (the function's invoke method would see
the input array of type `Int64`; it wouldn't see arrays of type `Null` or
`Dictionary(Int8, Int64)`.
This is the same for the following signature:
```rust
Signature::coercible(
vec![Coercion::new_implicit(
TypeSignatureClass::Native(logical_int64()),
vec![],
NativeType::Int64,
)],
Volatility::Immutable
)
```
**If a signature specifies `TypeSignatureClass::Native` then not only is the
native type accepted, but also `Null` and Dictionary/RunEndEncoded types of
this native type are allowed in (but casted to the specified type).**
## Using `TypeSignatureClass::Integer`
If we instead use any of these APIs:
```rust
Signature::coercible(
vec![Coercion::new_exact(TypeSignatureClass::Integer],
Volatility::Immutable
)
```
```rust
Signature::coercible(
vec![Coercion::new_implicit(
TypeSignatureClass::Integer,
vec![],
NativeType::Int64,
)],
Volatility::Immutable
)
```
That is, using `TypeSignatureClass::Integer` which encompasses the type
we're looking for, then Null/Dictionary/REE types are passed through
**without** being casted. Only if we use `TypeSignatureClass::Native` do they
get casted.
# Why is this a problem?
It makes it annoying for function implementations to handle, as depending on
which API they use they must provide an implementation that considers
Null/Dictionary/REE type arrays or not. This is subtle behaviour that is hard
to spot without appropriate testing.
# What to do
- See if we can make this behaviour more obvious
- Enhance signature API to tune this behaviour; let functions choose if:
- Dictionary/REE arrays are allowed as in (function implementation must
handle these array types)
- Dictionary/REE arrays are materialized (function implementation doesn't
need to consider Dictionary/REE array types as they'll be casted)
- Can we universally handle `Null` type arrays? Majority of functions likely
just return null anyway, it's tedious to implement this handling per function
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]