nateab opened a new pull request, #27524: URL: https://github.com/apache/flink/pull/27524
## Summary Optimizes RocksDB state backend value serialization by using `ByteBuffer` API instead of `byte[]` to avoid `Arrays.copyOf()` allocation on state updates. **Changes:** - Adds `serializeValueToByteBuffer()` and `serializeValueNullSensitiveToByteBuffer()` methods to `AbstractRocksDBState` - Updates single-put operations in `RocksDBValueState`, `RocksDBMapState`, `RocksDBReducingState`, `RocksDBAggregatingState`, and `AbstractRocksDBAppendingState` - Does NOT modify WriteBatch operations (`putAll`, etc.) - shared buffers are unsafe with deferred writes ## What This Does NOT Change - **Key serialization** - Still copies (requires `SerializedCompositeKeyBuilder` changes for full optimization - follow-up work) - **WriteBatch/putAll operations** - Shared buffers unsafe with deferred writes ## Thread Safety Safe because: - Flink uses single-threaded mailbox execution per task - `RocksDB.put()` is synchronous and copies to memtable before returning - Buffer reused only after `put()` completes ## Benchmark Results ### Serialization Overhead (isolated) | Value Size | getCopyOfBuffer() | getSharedBuffer() | Improvement | |------------|-------------------|-------------------|-------------| | 100 bytes | 9.11 ns | 3.73 ns | **59%** | | 500 bytes | 24.58 ns | 10.36 ns | **58%** | | 1000 bytes | 44.93 ns | 18.90 ns | **58%** | ### Realistic Workload (with RocksDB I/O + simulated record processing) | Value Size | Allocation Reduced | GC Collections | Throughput | |------------|-------------------|----------------|------------| | 100 bytes | 7.6 MB (1.3%) | 2 → 0 | +6.0% | | 500 bytes | 770.6 MB (56.1%) | 7 → 0 | +1.5% | | 1000 bytes | 1.7 GB (74.1%) | 13 → 0 | +5.8% | ### When This Optimization Helps Most - Large state values (500+ bytes) - High state update rates (>100K ops/sec) - Latency-sensitive applications where GC pauses matter ## Testing - All 839 RocksDB state backend tests pass - Specifically verified MapState/putAll tests (WriteBatch operations unchanged) - Added benchmarks: `SerializationBenchmark`, `GCPressureBenchmark`, `RealisticWorkloadBenchmark` ## Follow-up Work - Zero-copy key serialization in `SerializedCompositeKeyBuilder` - VoidNamespace fast path optimization -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
