LantaoJin opened a new pull request, #59:
URL: https://github.com/apache/datafusion-java/pull/59

   ## Which issue does this PR close?
   
   - Closes #58 .
   
   ## Rationale for this change
   
   `ScalarFunction.argTypes()` returned `List<ArrowType>` and `returnType()` 
returned `ArrowType`. In Java Arrow, `ArrowType` is a *leaf marker* for the 
type kind: it is self-describing for primitives like `Int32` or `Float64`, but 
for nested types (`List`, `Struct`, `Map`, `FixedSizeList`) the element / 
member / key / value types live on the parent `Field`'s `children` list, not 
inside `ArrowType`. `ArrowType.List` is literally a no-field marker class.
   
   A Java UDF author therefore had no way to declare a typed nested signature. 
Trying `argTypes() = List.of(new ArrowType.List())` blew up at registration 
time:
   
   ```
   IllegalArgumentException: Lists have one child Field. Found: none
     at SessionContext.serializeSchemaIpc(SessionContext.java:398)
     at SessionContext.registerUdf(SessionContext.java:391)
   ```
   
   This blocked the entire family of nested-type UDFs that exist as built-ins 
in DataFusion's `datafusion-functions-nested` crate (`array_length`, 
`cardinality`, `array_has`, `array_position`, `flatten`, `map_keys`, 
`map_values`, `arrows_zip`, ...). Anyone porting Spark UDFs over `ArrayType` / 
`StructType` / `MapType` columns to DataFusion-Java hit this on the first 
attempt.
   
   The Rust API does not have this problem because `DataType::List(Arc<Field>)` 
carries the child field inline. Switching the Java interface from `ArrowType` 
to `Field` is the structural mirror: `Field` is the only type that can carry 
children, so it's the type the interface has always needed to use.
   
   ## What changes are included in this PR?
   
   See commit log
   
   ## Are these changes tested?
   
   Yes -- 5 new tests over and above the existing 12.
   
   ## Are there any user-facing changes?
   
   Yes -- a source-breaking signature change to the public `ScalarFunction` 
interface. Existing primitive UDFs become slightly more verbose:
   
   ```java
   // Before:
   public List<ArrowType> argTypes() { return List.of(INT32); }
   public ArrowType returnType() { return INT32; }
   
   // After:
   public List<Field> argFields() { return List.of(Field.nullable("arg0", 
INT32)); }
   public Field returnField() { return Field.nullable("return", INT32); }
   ```
   
   The repo is pre-release, which makes this the right time to tighten the 
interface before downstream callers accumulate.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to