kumarpritam863 opened a new pull request, #14816:
URL: https://github.com/apache/iceberg/pull/14816
## Problem Statement
Currently, workers poll the **control topic** inside the main
record-processing loop using `poll(Duration.ZERO)`. This tightly-coupled design
leads to several critical issues:
1. **Missed START_COMMIT at startup**
Initial partition assignment delays cause workers to frequently miss the
first `START_COMMIT` from the coordinator → delayed snapshot creation at
destination.
2. **Throughput blocked by control-topic latency**
Any intermittent Kafka latency or outage on the control topic completely
blocks file writing → severe throughput degradation.
3. **Unreliable zero-duration polling**
`Duration.ZERO` only works reliably with constant traffic. When no
records are buffered on the broker, workers miss `START_COMMIT` cycles (both at
start and mid-stream).
4. **Wasteful processing of irrelevant events**
All workers consume the same control topic → every worker unnecessarily
processes coordinator events intended for other workers → significant CPU waste
and reduced throughput.
## Solution
This PR **completely decouples** control-topic processing from the main
record-processing flow by:
- Introducing **one dedicated consumer thread per worker** that exclusively
subscribes to the control topic.
- Pre-processing and caching coordinator events (`START_COMMIT`,
`COMMITTED`, etc.) in a thread-safe queue.
- Making these events instantly available to the main worker thread without
any polling in the hot path.
The main poll loop now only consumes data topics and is no longer blocked or
distracted by the control topic.
## Benefits
- Workers continue processing & writing records even during transient
control-topic / Kafka issues
- No processing of other workers' coordinator events
- **Zero missed START_COMMIT cycles** (startup or intermediate) → faster &
consistent snapshots
- **Significant throughput improvement** (~1.8–2.2× in load tests)
- **End-to-end latency reduced by ~commit-interval × 2**
- Memory impact is negligible → one extra thread per worker is fully
justified
## Validation
- Tested with 24–48 parallel workers under production-like load
- Verified no missed commits even with artificial delays on control topic
- Memory footprint remains within previous bounds
- All unit & integration tests pass
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]