HenryCaiHaiying opened a new issue, #16361:
URL: https://github.com/apache/iceberg/issues/16361

   ### Apache Iceberg version
   
   1.10.1 (latest release)
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   There is a performance problem on Coordinator's check on commit readiness 
which significantly degrade the system performance when there is a backlog on 
message processing on control topic.
   
   During each commit cycle, Coordinator reads each DATA_COMPLETE message from 
the control topic and calling commitState.isCommitReady() method to see whether 
we got all topic partitions represented from those messages.  However check is 
done through a loop for all previous messages.  If there are n DATA_COMPLETE 
messages, this is an O(N^2) calculation.
   
   When everything is smooth, the n is usually bound by the number of workers 
in the Kafka Connect cluster but when there is a backlog building up on the 
control topic, the things goes spiral down.  Often the backlog buildup started 
when there was a networking or HiveMetaStore availability issue, the 
Coordinator has problems committing entries to HMS.  The commit failed and the 
retry on the next commit cycle needs to process 2n messages from the control 
topic (because worker still keeps generating).  The inefficient processing of 
CommitState.isCommitReady() coupled with increased number of Kafka messages to 
be processed from the control topic cause the next commit cycle more prone to 
failure.
   
   The fix for this performance issue is simple, we just need to use a map to 
cache the topic partition names we have seen so far in CommitState.  Once the 
size of the map reaches the expected count, the commit is ready.  This should 
be an O(n) calculation.
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to