[GitHub] [pulsar] merlimat commented on pull request #9764: [Common] Reduce duration of locks when processing items held in ConcurrentOpenHashMaps

GitBox Tue, 02 Mar 2021 11:08:27 -0800


merlimat commented on pull request #9764:
URL: https://github.com/apache/pulsar/pull/9764#issuecomment-789143224



   > In this case, the memory allocated in forEachInSnapshot is a tiny fraction 
of all memory allocations happening. The ConcurrentOpenHashMap.forEach calls 
aren't on the fast path of processing where a lot of garbage could be 
generated. The amount of memory allocations are in the order of a few kilobytes 
per second since it's only about a few object references that consume memory 
(about 28 bytes per item: 16+4+4 bytes for Entry, 4 bytes for each item in the 
ArrayList). Is GC pressure really a problem in this case and how .forEach is 
used in the Pulsar code base?
   
   This map is used in many places throughout the code base and, depending on 
the use cases, there can be instances with a lot of entries or many instances 
with few entries. Also, the rate of this scanning through the map could be 
quite high in some cases.
   
   In light of that, I'm not convinced that a model where we always defensively 
copy the entire map when we need to do a map scan is a good general option, or 
a good replacement in the current code where most places assumed the scanning 
not be expensive (allocations-wise), even though there might be few specific 
cases in which it's appropriate.
   
   If we do have blocking in that code that process the map items, and that 
cannot be easily removed, I believe a better alternative would be to make the 
scanning more flexible.
   
   For example, we're not expecting to do a scan over a consistent snapshot of 
the map (for the simple reason that the map is broken down into independent 
sections). We only need to maintain the consistency between a key and it's 
associated value.
   
   With that in mind, we don't strictly need to get a read lock for the entire 
duration of scanning the whole section.
   We could instead use different approaches: 
    * Grab the stamp once per section and upgrade to read-lock only if there 
are write occurring
    * Do the same as a get, with stamp and fallback to read-lock per each item 
we're reading
   
   Another alternative could be to measure the time spent in the process 
function and decide based on that wether to make a copy or keep the read lock 
during the scan.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] merlimat commented on pull request #9764: [Common] Reduce duration of locks when processing items held in ConcurrentOpenHashMaps

Reply via email to