nateab opened a new pull request, #27510:
URL: https://github.com/apache/flink/pull/27510
…en data exceeds 2GB
## What is the purpose of the change
This pull request fixes an IndexOutOfBoundsException in
AbstractBytesMultiMap that occurs when window aggregate queries (using TUMBLE
TVF with LAST_VALUE aggregations) process large amounts of data with the
memory state backend. When the value/key area exceeds ~2GB
(Integer.MAX_VALUE bytes), casting the long offset to int overflows to a
negative value, causing invalid array access.
## Brief change log
- Added bounds checking in writePointer() to validate offset before
casting long to int
- Added bounds checking in skipPointer() to validate offset before casting
long to int
- Added bounds checking in appendRecord() to validate key area offset
before casting
- Removed redundant post-cast checks in appendValue() and appendRecord()
since writePointer() now handles validation
- Ensures EOFException is thrown with a clear warning message before any
overflow can occur
## Verifying this change
This change is already covered by existing tests, such as:
- BytesMultiMapTest - tests basic map operations
- WindowBytesMultiMapTest - tests window-specific map operations
- RecordsWindowBufferTest - tests the window buffer that uses the map
The fix adds validation that throws EOFException earlier in the code path.
The existing EOFException handling in RecordsWindowBuffer.addElement() already
properly handles this case by flushing the buffer and
retrying.
##Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
@Public(Evolving): no
- The serializers: no
- The runtime per-record code paths (performance sensitive): no (only adds
a comparison before existing cast)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]