lhotari opened a new pull request #9764:
URL: https://github.com/apache/pulsar/pull/9764


   ### Motivation
   
   Currently `ConcurrentOpenHashMap.forEach` keeps a read lock in each section 
for the duration of the processing. This can lead to various issues when I/O 
operations are done in the processing function. This is a common patter in 
Pulsar code base. The I/O usually happens to Zookeeper and it leads to threads 
piling up in waiting for locks in ConcurrentOpenHashMap when there are both 
reads and writes for the same section in concurrently executing threads. 
   It is also possible that long living locks lead to dead locks.
   
   This is the current implementation of `ConcurrentOpenHashMap.forEach`
   
https://github.com/apache/pulsar/blob/34ca8938ca13ea60b05d164c27d9755855caf87c/pulsar-common/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentOpenHashMap.java#L157-L161
   and the referenced `Section.forEach`
   
https://github.com/apache/pulsar/blob/34ca8938ca13ea60b05d164c27d9755855caf87c/pulsar-common/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentOpenHashMap.java#L356-L395
   
   ### Modifications
   
   - Add a new method `ConcurrentOpenHashMap.forEachInSnapshot` which makes a 
copy of the entries in each section before processing the entries. 
   
   - Replace all current usages of `forEach` with `forEachInSnapshot`
   
   There are tradeoffs involved. Some memory allocations will have to be made 
to make a copy. However, allocations aren't a real issue. JVMs are really 
efficient for such short time allocations and the allocations aren't causing 
new bottlenecks or performance issues. 
   The other tradeoff is a related to data consistency. The entry might have 
already been removed from the collection at the time it is processed. This is 
also a non-issue in most cases since it would be a bad solution in the first 
place to determine state based on the membership in the collection. 
   
   Since the previous `forEach` implementation remains, it's possible to choose 
between the usage of `forEach` or `forEachInSnapshot` in each specific use 
case. If this feature isn't needed, it is possible to replace the `forEach` 
implemention with the snapshotting implemention completely.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to