This is an automated email from the ASF dual-hosted git repository.
viirya pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.0 by this push:
new 9d45e6fc36b0 [SPARK-55802][SQL][4.0] Fix integer overflow when
computing Arrow batch bytes
9d45e6fc36b0 is described below
commit 9d45e6fc36b02c84051743bcad7a372688c2940a
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Wed Mar 4 10:17:21 2026 -0800
[SPARK-55802][SQL][4.0] Fix integer overflow when computing Arrow batch
bytes
### What changes were proposed in this pull request?
### Why are the changes needed?
`ArrowWriter.sizeInBytes()` accumulated per-column buffer sizes (each an
`Int`) into an `Int` accumulator. When the total exceeds 2 GB the sum silently
wraps negative, causing the byte-limit check controlled by
`spark.sql.execution.arrow.maxBytesPerBatch` to behave incorrectly and
potentially allow oversized batches through.
Fix by changing the accumulator and return type to `Long`.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.6 <noreplyanthropic.com>
Closes #54624 from viirya/backport-arrow-batch-bytes-overflow-branch-4.0.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
.../main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
index d91b6de9b1df..4a68cf6c8f9f 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
@@ -112,9 +112,9 @@ class ArrowWriter(val root: VectorSchemaRoot, fields:
Array[ArrowFieldWriter]) {
count += 1
}
- def sizeInBytes(): Int = {
+ def sizeInBytes(): Long = {
var i = 0
- var bytes = 0
+ var bytes = 0L
while (i < fields.size) {
bytes += fields(i).getSizeInBytes()
i += 1
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]