atognolas opened a new pull request, #39157: URL: https://github.com/apache/beam/pull/39157
## Summary - Add `writers.cleanUp()` before the `openWriters >= maxNumWriters` capacity check in `RecordWriterManager.DestinationState.write()` - Guava Cache eviction is lazy — expired entries are only removed on subsequent access or explicit `cleanUp()`, not proactively - Without this fix, `openWriters` stays stale after writers expire, causing 100% false spill for tables with more partitions than `maxNumWriters` (default 20) ## Motivation With `DEFAULT_MAX_WRITERS_PER_BUNDLE = 20` and more than 20 partitions, every record for the 21st+ partition is rejected as if the writer pool is full, triggering the spill path. Observed on a 400-partition table: 100% of records spilled, causing disk exhaustion. After the fix, expired writers are properly evicted before the capacity check, and spill only occurs when the pool is genuinely full. ## Test plan - [ ] Existing `RecordWriterManagerTest` passes - [ ] Run IcebergIO integration tests with >20 partitions - [ ] Verify spill rate drops to near-zero for tables with idle partitions 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
