LantaoJin opened a new issue, #58:
URL: https://github.com/apache/datafusion-java/issues/58
### Describe the bug
`ScalarFunction.argTypes()` returns `List<ArrowType>` and `returnType()`
returns `ArrowType`
(`core/src/main/java/org/apache/datafusion/ScalarFunction.java:47, :50`). Java
Arrow's `ArrowType` is a *leaf marker* for the type kind: for primitives like
`Int32` or `Float64` it is self-describing, but for nested types (`List`,
`Struct`, `Map`, `FixedSizeList`) the element / member / key / value types live
on the parent `Field`'s `children` list, not inside `ArrowType` itself.
`ArrowType.List` is literally a no-field marker class.
That mismatch means a Java UDF author has no way to declare a typed nested
signature. The closest they can write is:
```java
public List<ArrowType> argTypes() {
return List.of(new ArrowType.List()); // says "list" -- cannot say "of
Int32"
}
```
When this is passed to `SessionContext.registerUdf(ScalarUdf)` the
registration path at
`core/src/main/java/org/apache/datafusion/SessionContext.java:385-389`
constructs the signature schema as:
```java
fields.add(new Field("return", FieldType.nullable(returnType), null));
for (int i = 0; i < argTypes.size(); i++) {
fields.add(new Field("arg" + i, FieldType.nullable(argTypes.get(i)),
null));
}
```
The `null` children list is the bug: Arrow's IPC writer rejects the
malformed `List` field during `serializeSchemaIpc(...)` before the schema ever
crosses JNI. The user sees a low-level `IllegalArgumentException: Lists have
one child Field. Found: none`.
This blocks the entire family of nested-type UDFs that exist as built-ins in
DataFusion's `datafusion-functions-nested` crate (`array_length`,
`cardinality`, `array_has`, `array_position`, `flatten`, `map_keys`,
`map_values`, `arrays_zip`, ...). Anyone porting Spark UDFs over `ArrayType` /
`StructType` / `MapType` columns to DataFusion-Java hits this on the first
attempt.
The Rust API does not have this problem: `DataType::List(Arc<Field>)`
carries the child field inline, so
`Signature::exact(vec![DataType::List(Arc::new(Field::new("item",
DataType::Int32, true)))], ...)` round-trips with full structure.
### To Reproduce
```java
static final class ListLength implements ScalarFunction {
public String name() { return "java_list_length"; }
public List<ArrowType> argTypes() { return List.of(new ArrowType.List()); }
public ArrowType returnType() { return new ArrowType.Int(32, true); }
public Volatility volatility() { return Volatility.IMMUTABLE; }
public FieldVector evaluate(BufferAllocator allocator, List<FieldVector>
args, int rowCount) {
/* ... */
}
}
new SessionContext().registerUdf(new ScalarUdf(new ListLength()));
// throws:
// IllegalArgumentException: Lists have one child Field. Found: none
// at SessionContext.serializeSchemaIpc(SessionContext.java:398)
// at SessionContext.registerUdf(SessionContext.java:391)
```
### Expected behavior
A UDF whose argument or return type is a nested Arrow type registers
successfully and is callable from SQL with full element-type information
preserved end-to-end (Java → JNI → Rust `Signature::exact`).
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]