atognolas opened a new pull request, #39157:
URL: https://github.com/apache/beam/pull/39157

   ## Summary
   - Add `writers.cleanUp()` before the `openWriters >= maxNumWriters` capacity 
check in `RecordWriterManager.DestinationState.write()`
   - Guava Cache eviction is lazy — expired entries are only removed on 
subsequent access or explicit `cleanUp()`, not proactively
   - Without this fix, `openWriters` stays stale after writers expire, causing 
100% false spill for tables with more partitions than `maxNumWriters` (default 
20)
   
   ## Motivation
   With `DEFAULT_MAX_WRITERS_PER_BUNDLE = 20` and more than 20 partitions, 
every record for the 21st+ partition is rejected as if the writer pool is full, 
triggering the spill path. Observed on a 400-partition table: 100% of records 
spilled, causing disk exhaustion.
   
   After the fix, expired writers are properly evicted before the capacity 
check, and spill only occurs when the pool is genuinely full.
   
   ## Test plan
   - [ ] Existing `RecordWriterManagerTest` passes
   - [ ] Run IcebergIO integration tests with >20 partitions
   - [ ] Verify spill rate drops to near-zero for tables with idle partitions
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to