sepuri sai krishna created SPARK-57665:
------------------------------------------

             Summary: slice() returns an empty array for a large length due to 
int overflow in the interpreted path
                 Key: SPARK-57665
                 URL: https://issues.apache.org/jira/browse/SPARK-57665
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.0.3
            Reporter: sepuri sai krishna


slice(array, start, length) silently drops all elements and returns an empty 
array when `length` is large enough that `start_0based + length` overflows a 
32-bit int.

How to reproduce (Spark 4.0+, no config needed): Verified on released Spark 
4.0.3 (Scala 2.13.16) in spark-shell.

SELECT slice(array(1,2,3,4,5,6), 2, 2147483647)
    => []                      (expected: [2,3,4,5,6])

Root cause: Slice.nullSafeEval computes data.slice(startIndex, startIndex + 
lengthInt) for a large length, startIndex + lengthInt overflows to a negative 
`until`. Under Scala 2.13 (Spark 4.0+), Seq.slice with a negative `until` 
yields an empty result, so the whole tail is dropped. The codegen path uses 
ArrayExpressionUtils.sliceLength, which clamps to the remaining element count 
and returns the correct tail, so the two execution paths disagree. (Spark 3.5 / 
Scala 2.12 is unaffected: 2.12's slice double overflows and accidentally 
returns the correct elements.)

For constant arguments the wrong value is produced even by default, because 
ConstantFolding evaluates the expression via the interpreted eval() at plan 
time.

Context: SPARK-57171 extracted the index arithmetic into 
ArrayExpressionUtils.sliceLength and routed the codegen path through it, but 
the interpreted path (Slice.nullSafeEval) was left computing 
data.slice(startIndex, startIndex + lengthInt) directly. Proposed fix: route 
the interpreted path through the same sliceLength helper so both paths agree 
and the index arithmetic cannot overflow.


!image-2026-06-24-20-40-03-883.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to