marvelshan commented on code in PR #4580:
URL: https://github.com/apache/datafusion-comet/pull/4580#discussion_r3384951917
##########
native/spark-expr/src/array_funcs/size.rs:
##########
@@ -198,6 +198,14 @@ fn spark_size_scalar(scalar: &ScalarValue) ->
Result<ScalarValue, DataFusionErro
Ok(ScalarValue::Int32(Some(len)))
}
}
+ ScalarValue::Map(array) => {
+ if array.is_null(0) {
+ Ok(ScalarValue::Int32(Some(-1)))
Review Comment:
`spark.sql.legacy.sizeOfNull` doesn't affect the native return value. The
serde layer already wraps `size(child)` into `CASE WHEN isNotNull(child) THEN
size(child) ELSE (-1|null) END`, so the null branch is handled by CaseWhen's
ELSE literal and native `size()` only receives non-null inputs. The
ConfigMatrix in `size.sql` already covers both behaviors.
##########
spark/src/test/scala/org/apache/comet/CometMapExpressionSuite.scala:
##########
@@ -126,34 +126,35 @@ class CometMapExpressionSuite extends CometTestBase {
}
}
- test("fallback for size with map input") {
- withTempDir { dir =>
- withTempView("t1") {
- val path = new Path(dir.toURI.toString, "test.parquet")
- makeParquetFileAllPrimitiveTypes(path, dictionaryEnabled = true, 100)
- spark.read.parquet(path.toString).createOrReplaceTempView("t1")
-
- // Use column references in maps to avoid constant folding
- checkSparkAnswerAndFallbackReason(
- sql("SELECT size(case when _2 < 0 then map(_8, _9) else map() end)
from t1"),
- "size does not support map inputs")
+ test("size with map input") {
+ withTempPath { dir =>
+ withSQLConf(CometConf.COMET_ENABLED.key -> "false") {
+ val df = spark
+ .range(100)
+ .select(
+ when(col("id") > 1, map(col("id"), when(col("id") > 2, col("id"))))
+ .alias("map1"),
+ when(col("id") > 5, map(lit("a"), col("id"), lit("b"), col("id") +
1))
+ .alias("map2"))
+ df.write.parquet(dir.toString())
}
- }
- }
-
- // fails with "map is not supported"
- ignore("size with map input") {
Review Comment:
Yes, it has been promoted from `ignore` to `test("size with map input")`
using `checkSparkAnswerAndOperator`, and the original test using
`makeParquetFileAllPrimitiveTypes` is also restored with
`checkSparkAnswerAndOperator` instead of `checkSparkAnswerAndFallbackReason`. A
separate `test("size with map input - v2 reader")` covers the v2 reader path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]