[ 
https://issues.apache.org/jira/browse/SPARK-57665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sepuri sai krishna updated SPARK-57665:
---------------------------------------
    Attachment: Screenshot from 2026-06-24 20-38-51.png

> slice() returns an empty array for a large length due to int overflow in the 
> interpreted path
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57665
>                 URL: https://issues.apache.org/jira/browse/SPARK-57665
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.3
>            Reporter: sepuri sai krishna
>            Priority: Major
>         Attachments: Screenshot from 2026-06-24 20-38-51.png
>
>
> slice(array, start, length) silently drops all elements and returns an empty 
> array when `length` is large enough that `start_0based + length` overflows a 
> 32-bit int.
> How to reproduce (Spark 4.0+, no config needed): Verified on released Spark 
> 4.0.3 (Scala 2.13.16) in spark-shell.
> SELECT slice(array(1,2,3,4,5,6), 2, 2147483647)
>     => []                      (expected: [2,3,4,5,6])
> Root cause: Slice.nullSafeEval computes data.slice(startIndex, startIndex + 
> lengthInt) for a large length, startIndex + lengthInt overflows to a negative 
> `until`. Under Scala 2.13 (Spark 4.0+), Seq.slice with a negative `until` 
> yields an empty result, so the whole tail is dropped. The codegen path uses 
> ArrayExpressionUtils.sliceLength, which clamps to the remaining element count 
> and returns the correct tail, so the two execution paths disagree. (Spark 3.5 
> / Scala 2.12 is unaffected: 2.12's slice double overflows and accidentally 
> returns the correct elements.)
> For constant arguments the wrong value is produced even by default, because 
> ConstantFolding evaluates the expression via the interpreted eval() at plan 
> time.
> Context: SPARK-57171 extracted the index arithmetic into 
> ArrayExpressionUtils.sliceLength and routed the codegen path through it, but 
> the interpreted path (Slice.nullSafeEval) was left computing 
> data.slice(startIndex, startIndex + lengthInt) directly. Proposed fix: route 
> the interpreted path through the same sliceLength helper so both paths agree 
> and the index arithmetic cannot overflow.
> !image-2026-06-24-20-40-03-883.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to