[
https://issues.apache.org/jira/browse/SPARK-57665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sepuri sai krishna updated SPARK-57665:
---------------------------------------
Attachment: Screenshot from 2026-06-24 20-38-51.png
> slice() returns an empty array for a large length due to int overflow in the
> interpreted path
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-57665
> URL: https://issues.apache.org/jira/browse/SPARK-57665
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.3
> Reporter: sepuri sai krishna
> Priority: Major
> Attachments: Screenshot from 2026-06-24 20-38-51.png
>
>
> slice(array, start, length) silently drops all elements and returns an empty
> array when `length` is large enough that `start_0based + length` overflows a
> 32-bit int.
> How to reproduce (Spark 4.0+, no config needed): Verified on released Spark
> 4.0.3 (Scala 2.13.16) in spark-shell.
> SELECT slice(array(1,2,3,4,5,6), 2, 2147483647)
> => [] (expected: [2,3,4,5,6])
> Root cause: Slice.nullSafeEval computes data.slice(startIndex, startIndex +
> lengthInt) for a large length, startIndex + lengthInt overflows to a negative
> `until`. Under Scala 2.13 (Spark 4.0+), Seq.slice with a negative `until`
> yields an empty result, so the whole tail is dropped. The codegen path uses
> ArrayExpressionUtils.sliceLength, which clamps to the remaining element count
> and returns the correct tail, so the two execution paths disagree. (Spark 3.5
> / Scala 2.12 is unaffected: 2.12's slice double overflows and accidentally
> returns the correct elements.)
> For constant arguments the wrong value is produced even by default, because
> ConstantFolding evaluates the expression via the interpreted eval() at plan
> time.
> Context: SPARK-57171 extracted the index arithmetic into
> ArrayExpressionUtils.sliceLength and routed the codegen path through it, but
> the interpreted path (Slice.nullSafeEval) was left computing
> data.slice(startIndex, startIndex + lengthInt) directly. Proposed fix: route
> the interpreted path through the same sliceLength helper so both paths agree
> and the index arithmetic cannot overflow.
> !image-2026-06-24-20-40-03-883.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]