Jefffrey opened a new issue, #19458:
URL: https://github.com/apache/datafusion/issues/19458

   This issue servers to document some existing behaviour, as well as discuss 
if we can change this to something more intuitive/ergonomic.
   
   # Problem
   
   When using `TypeSignature::Coercible`, there is some interesting behaviour 
when we have `DataType::Null`, `DataType::Dictionary` and 
`DataType::RunEndEncoded` passed in.
   
   Given a function with signature like so:
   
   ```rust
   Signature::coercible(
       vec![Coercion::new_exact(TypeSignatureClass::Native(logical_int64())],
       Volatility::Immutable
   )
   ```
   
   It expects only a single `Int64` argument. However, if we pass in 
`DataType::Null` or `Dictionary(Int8, Int64)` argument, these are actually 
valid and in-fact are casted to `Int64` (the function's invoke method would see 
the input array of type `Int64`; it wouldn't see arrays of type `Null` or 
`Dictionary(Int8, Int64)`.
   
   This is the same for the following signature:
   
   ```rust
   Signature::coercible(
       vec![Coercion::new_implicit(
           TypeSignatureClass::Native(logical_int64()),
           vec![],
           NativeType::Int64,
       )],
       Volatility::Immutable
   )
   ```
   
   **If a signature specifies `TypeSignatureClass::Native` then not only is the 
native type accepted, but also `Null` and Dictionary/RunEndEncoded types of 
this native type are allowed in (but casted to the specified type).**
   
   ## Using `TypeSignatureClass::Integer`
   
   If we instead use any of these APIs:
   
   ```rust
   Signature::coercible(
       vec![Coercion::new_exact(TypeSignatureClass::Integer],
       Volatility::Immutable
   )
   ```
   
   ```rust
   Signature::coercible(
       vec![Coercion::new_implicit(
           TypeSignatureClass::Integer,
           vec![],
           NativeType::Int64,
       )],
       Volatility::Immutable
   )
   ```
   
   That is, using `TypeSignatureClass::Integer` which encompasses the type 
we're looking for, then Null/Dictionary/REE types are passed through 
**without** being casted. Only if we use `TypeSignatureClass::Native` do they 
get casted.
   
   # Why is this a problem?
   
   It makes it annoying for function implementations to handle, as depending on 
which API they use they must provide an implementation that considers 
Null/Dictionary/REE type arrays or not. This is subtle behaviour that is hard 
to spot without appropriate testing.
   
   # What to do
   
   - See if we can make this behaviour more obvious
   - Enhance signature API to tune this behaviour; let functions choose if:
     - Dictionary/REE arrays are allowed as in (function implementation must 
handle these array types)
     - Dictionary/REE arrays are materialized (function implementation doesn't 
need to consider Dictionary/REE array types as they'll be casted)
   - Can we universally handle `Null` type arrays? Majority of functions likely 
just return null anyway, it's tedious to implement this handling per function


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to