[PR] Decouple control topic processing from record poll loop – improve throughput & reliability [iceberg]

via GitHub Wed, 10 Dec 2025 20:33:29 -0800


kumarpritam863 opened a new pull request, #14816:
URL: https://github.com/apache/iceberg/pull/14816


   ## Problem Statement
   Currently, workers poll the **control topic** inside the main 
record-processing loop using `poll(Duration.ZERO)`. This tightly-coupled design 
leads to several critical issues:
   
   1. **Missed START_COMMIT at startup**  
      Initial partition assignment delays cause workers to frequently miss the 
first `START_COMMIT` from the coordinator → delayed snapshot creation at 
destination.
   
   2. **Throughput blocked by control-topic latency**  
      Any intermittent Kafka latency or outage on the control topic completely 
blocks file writing → severe throughput degradation.
   
   3. **Unreliable zero-duration polling**  
      `Duration.ZERO` only works reliably with constant traffic. When no 
records are buffered on the broker, workers miss `START_COMMIT` cycles (both at 
start and mid-stream).
   
   4. **Wasteful processing of irrelevant events**  
      All workers consume the same control topic → every worker unnecessarily 
processes coordinator events intended for other workers → significant CPU waste 
and reduced throughput.
   
   ## Solution
   This PR **completely decouples** control-topic processing from the main 
record-processing flow by:
   
   - Introducing **one dedicated consumer thread per worker** that exclusively 
subscribes to the control topic.
   - Pre-processing and caching coordinator events (`START_COMMIT`, 
`COMMITTED`, etc.) in a thread-safe queue.
   - Making these events instantly available to the main worker thread without 
any polling in the hot path.
   
   The main poll loop now only consumes data topics and is no longer blocked or 
distracted by the control topic.
   
   ## Benefits
   - Workers continue processing & writing records even during transient 
control-topic / Kafka issues  
   - No processing of other workers' coordinator events  
   - **Zero missed START_COMMIT cycles** (startup or intermediate) → faster & 
consistent snapshots  
   - **Significant throughput improvement** (~1.8–2.2× in load tests)  
   - **End-to-end latency reduced by ~commit-interval × 2**  
   - Memory impact is negligible → one extra thread per worker is fully 
justified
   
   ## Validation
   - Tested with 24–48 parallel workers under production-like load  
   - Verified no missed commits even with artificial delays on control topic  
   - Memory footprint remains within previous bounds  
   - All unit & integration tests pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Decouple control topic processing from record poll loop – improve throughput & reliability [iceberg]

Reply via email to