sauliusvl opened a new pull request, #28252:
URL: https://github.com/apache/flink/pull/28252

   ## What is the purpose of the change
   
   Fixes [FLINK-39754](https://issues.apache.org/jira/browse/FLINK-39754). 
`DataOutputSerializer.resize()` uses `int` arithmetic for `buffer.length * 2`. 
Once `buffer.length` crosses `Integer.MAX_VALUE / 2` (~1.07 GB), doubling 
overflows to a negative `int`, `Math.max` then picks `buffer.length + 
minCapacityAdd`, and every subsequent resize grows the buffer by a handful of 
bytes instead of doubling — doing a full `System.arraycopy` of the ~1+ GB 
buffer each call. On large heaps this manifests as a silent O(n²) hang until 
`buffer.length + minCapacityAdd` itself overflows and the existing `catch 
(NegativeArraySizeException)` translates it to an `IOException`.
   
   ## Brief change log
   
     - Extract the size computation from `resize(int)` into a 
`@VisibleForTesting` package-private static helper `computeNewBufferLength(int, 
int)`.
     - The helper uses `long` arithmetic, validates against a new 
`MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8` cap (matching `java.util.ArrayList`), 
and jumps to the cap when doubling would overflow — so serializations that just 
barely fit under 2 GB still complete instead of grinding through a linear-step 
resize loop.
     - Remove the now-unreachable `catch (NegativeArraySizeException)` block 
from `resize`. The existing `OutOfMemoryError` retry path is preserved (it 
addresses an independent concern — doubled size exceeding available heap).
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
     - Five pure-arithmetic unit tests on `computeNewBufferLength` in 
`DataInputOutputSerializerTest` covering: normal doubling, 
`minCapacityAdd`-dominated growth, jump-to-cap when `currentLength * 2` would 
overflow, exact-cap boundary, and `IOException` when the required size exceeds 
the cap. No multi-GB allocations required.
     - Existing `DataInputOutputSerializerTest` tests continue to pass, 
confirming the normal write/read paths through `resize()` are unchanged.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no (`DataOutputSerializer` is unannotated / internal)
     - The serializers: no (this is the byte-buffer growth path, not record 
(de)serialization logic)
     - The runtime per-record code paths (performance sensitive): no (the 
helper runs only on buffer growth, not per record; the buggy linear-step path 
it replaces is what was previously degrading performance)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no behavior change for 
serializations < ~1 GB. Serializations that previously silently O(n²)-hung near 
2 GB now either complete cleanly (one final grow to the cap) or fail with an 
actionable `IOException` instead of an opaque 
`NegativeArraySizeException`-derived message.
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes (please specify the tool below)
   
   Generated-by: Claude (Anthropic, Opus 4.7) via Zed editor
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to