andygrove opened a new issue, #4472:
URL: https://github.com/apache/datafusion-comet/issues/4472

   ## Describe the bug
   
   Spark's `size(expr)` accepts both `ArrayType` and `MapType` inputs 
(`Size.inputTypes = Seq(TypeCollection(ArrayType, MapType))` in 
`collectionOperations.scala`, identical across 3.4.3 / 3.5.8 / 4.0.1 / 4.1.1). 
Comet's `CometSize` only supports `ArrayType`; for `MapType` it returns 
`Unsupported(Some("size does not support map inputs"))` and falls back to Spark.
   
   Surfaced by the collection-expressions audit in apache/datafusion-comet#4471.
   
   ## Steps to reproduce
   
   ```sql
   CREATE TABLE t(m map<string, int>) USING parquet;
   INSERT INTO t VALUES (map('a', 1, 'b', 2));
   SELECT size(m) FROM t;
   ```
   
   Spark returns `2`. Comet falls back to Spark for the entire plan node.
   
   ## Expected behavior
   
   Native support for `size(<map>)`. Arrow's `MapArray` carries a length per 
row that can drive the same `numElements` semantics Spark uses, with the 
existing `legacySizeOfNull` config-driven null handling that `CometSize` 
already implements for arrays.
   
   ## Additional context
   
   - Serde: `CometSize` in 
`spark/src/main/scala/org/apache/comet/serde/arrays.scala` (line ~640)
   - Native: routes through `size` scalar function in `comet_scalar_funcs.rs`; 
the size UDF would need a `MapType` branch.
   - Related: `cardinality` is an alias for `size` in Spark and would benefit 
from the same fix.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to