andygrove opened a new issue, #21512:
URL: https://github.com/apache/datafusion/issues/21512

   ### Describe the bug
   
   The `datafusion-spark` implementation of `array_repeat` incorrectly returns 
NULL when the first argument (element) is NULL. In Apache Spark, only a NULL 
count (second argument) produces a NULL result — a NULL element should be 
repeated into the array.
   
   ### To Reproduce
   
   **PySpark (correct behavior):**
   ```sql
   SELECT array_repeat(NULL, 2);    -- [NULL, NULL]
   SELECT array_repeat(NULL, 1);    -- [NULL]
   SELECT array_repeat(NULL, 0);    -- []
   SELECT array_repeat('x', NULL);  -- NULL
   ```
   
   **DataFusion-spark (incorrect behavior):**
   ```sql
   SELECT array_repeat(NULL, 2);    -- NULL (should be [NULL, NULL])
   SELECT array_repeat(NULL, 1);    -- NULL (should be [NULL])
   SELECT array_repeat(NULL, 0);    -- NULL (should be [])
   SELECT array_repeat('x', NULL);  -- NULL (correct)
   ```
   
   The `.slt` test at 
`datafusion/sqllogictest/test_files/spark/array/array_repeat.slt` line 59 has 
the wrong expected value (`NULL` instead of `[NULL, NULL]`). Line 79 also has a 
wrong expected value for the `(NULL, 1)` row (`NULL` instead of `[NULL]`).
   
   ### Expected behavior
   
   | Expression | Spark result | datafusion-spark result |
   |---|---|---|
   | `array_repeat('x', 3)` | `[x, x, x]` | `[x, x, x]` ✓ |
   | `array_repeat(NULL, 2)` | `[NULL, NULL]` | `NULL` ✗ |
   | `array_repeat(NULL, 1)` | `[NULL]` | `NULL` ✗ |
   | `array_repeat(NULL, 0)` | `[]` | `NULL` ✗ |
   | `array_repeat('x', NULL)` | `NULL` | `NULL` ✓ |
   
   ### Additional context
   
   **Root cause:** `SparkArrayRepeat::spark_array_repeat` in 
`datafusion/spark/src/function/array/repeat.rs` uses `compute_null_mask` on all 
arguments, which returns NULL if *any* argument is NULL. But `array_repeat` 
should only return NULL when the count (second argument) is NULL — a NULL 
element should be passed through to DataFusion's underlying `array_repeat`, 
which correctly repeats it.
   
   **Fix:** Only check the second argument (count) for NULL, not the first 
argument (element).
   
   The `.slt` expected values at lines 59 and 79 will also need to be corrected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to