This is an automated email from the ASF dual-hosted git repository.
viirya pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new df31a9708bcb [SPARK-56904][SQL] Fix Int overflow in LongToUnsafeRowMap
page size computations
df31a9708bcb is described below
commit df31a9708bcbe458d0af5c87664ba536be283b7d
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Sun May 17 15:00:41 2026 -0700
[SPARK-56904][SQL] Fix Int overflow in LongToUnsafeRowMap page size
computations
### What changes were proposed in this pull request?
Fix three sites in `LongToUnsafeRowMap` where a `Long` page-word count is
multiplied by 8 using `Int` arithmetic. At the upper bound (`1 << 30` long
words, the explicit cap in `grow` plus the 8 GiB ceiling), `Int * 8` wraps to 0:
- `LongToUnsafeRowMap.grow`: `val newPage =
allocatePage(newNumWords.toInt * 8)`
- `LongToUnsafeRowMap.read` (deserialization on executors): `page =
allocatePage(pageLength * 8)` `cursor = pageLength * 8 + page.getBaseOffset`
When the multiplication overflows to 0, `MemoryConsumer.allocatePage(0)`
falls through `TaskMemoryManager.allocatePage(Math.max(pageSize, 0))` and
returns a default-sized page. Subsequent `append`s keep advancing `cursor` past
the new page's end and `Platform.copyMemory(... page.getBaseObject, cursor,
...)` writes/reads into adjacent native pages, eventually crashing inside the
SIMD-optimized `StubRoutines::forward_copy_longs` on aarch64 (SEGV_ACCERR at
the over-read of the next mmap page).
We observed the crash on ARM Graviton; this fix resolves it. The bug is a
latent heap corruption regardless of architecture.
Fix: use `Long` multiplication (`* 8L`) at all three sites so the multiply
matches `allocatePage`/`cursor`'s declared `Long` types.
### Why are the changes needed?
To fix a JVM SEGV in `LongToUnsafeRowMap` triggered when the page reaches
the 8 GiB cap, observed on ARM Graviton.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing `HashedRelationSuite` tests cover the affected paths. Validated on
a downstream broadcast-hash-join build on ARM Graviton where the original SEGV
reproduced; no crash with this fix applied.
The reproducible suite is internal and it is hard to port to OSS. But the
bug can be observed from the code clearly.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code
Closes #55929 from viirya/SPARK-54116-fix-int-overflow.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit bccbf2234a97e52830c3f6417806e0fe25a7c229)
Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
.../apache/spark/sql/execution/joins/HashedRelation.scala | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
index 242185e80357..7712fdc9f6cc 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
@@ -825,7 +825,13 @@ private[execution] final class LongToUnsafeRowMap(
throw QueryExecutionErrors.cannotBuildHashedRelationLargerThan8GError()
}
val newNumWords = math.max(neededNumWords, math.min(page.size() / 8 * 2,
1 << 30))
- val newPage = allocatePage(newNumWords.toInt * 8)
+ // newNumWords is a Long up to 1 << 30. Multiplying by 8 must stay in
Long
+ // arithmetic; `newNumWords.toInt * 8` (Int * Int) overflows to 0 at the
+ // upper bound, causing `allocatePage(0)` to fall back to the default
page
+ // size while subsequent writes still advance `cursor` past the new
page's
+ // end (heap corruption observed as a `forward_copy_longs` SEGV during
+ // BHJ build on aarch64).
+ val newPage = allocatePage(newNumWords * 8L)
Platform.copyMemory(page.getBaseObject, page.getBaseOffset,
newPage.getBaseObject,
newPage.getBaseOffset, usedBytes)
freePage(page)
@@ -966,10 +972,13 @@ private[execution] final class LongToUnsafeRowMap(
readData(readBuffer, array.memoryBlock.getBaseObject,
array.memoryBlock.getBaseOffset, length)
val pageLength = readLong().toInt
freePage(page)
- page = allocatePage(pageLength * 8)
+ // Use Long multiplication: pageLength can be up to 1 << 30 (8 GiB page /
8),
+ // and `Int * Int` overflows at that bound, leading to a 0-byte
allocatePage
+ // and a subsequent cursor that runs past the page's end.
+ page = allocatePage(pageLength * 8L)
readData(readBuffer, page.getBaseObject, page.getBaseOffset, pageLength)
// Restore cursor variable to make this map able to be serialized again on
executors.
- cursor = pageLength * 8 + page.getBaseOffset
+ cursor = pageLength * 8L + page.getBaseOffset
}
override def readExternal(in: ObjectInput): Unit = {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]