merlimat commented on pull request #9764:
URL: https://github.com/apache/pulsar/pull/9764#issuecomment-789930324


   > It's not really a defensive copy of the entire map. The copy is done one 
section at a time. 
   
   By defensive, I mean that we'd be taking a copy of the entire map (section 
by section) each time we need to traverse it, even though there would no need 
to do it (eg: when a section is not being modified). In case of a map which is 
not updated super-frequently, that would impose a size-able overhead.
   
   > Memory allocations and GC is extremely efficient in modern JVMs. 
   
   I've been battling GC and allocations for many years now and while that 
might be true to a certain extent, there are also many bad GC behaviors that we 
had fixed doing changes like switching from the regular `ConcurrentHashMap` to 
this custom implementation.
   
   > Since it's about 28 bytes per entry, there would have to have millions of 
entries for the memory allocation overhead to have any noticeable significance. 
If necessary, this could be benchmarked.
   Since there aren't millions of entries, but usually a few thousand of 
entries, it has even less significance. This is why I don't see a reason to 
avoid the copying in this case.
   
   The problem is that there can be easily millions of entries (aggregated from 
100s of thousands of maps) on a single instance. That has been the case in many 
production deployments for many years.
   
   > We do have IO operations (interactions with Zookeeper) in most of the code 
that processes the map items and it cannot be easily removed.
   
   Sure, but we should take care of them (eg: during the refactoring to port 
these components to use MetadataStore instead of direct ZK access. 
   
   Also, I was commenting that we could do it such that we automatically switch 
to individual locks (instead of whole section lock) if the processing is taking 
more than X amount of time.
   
   > It reduces the duration of the locks in the case that there are no writes. 
When there are writes, the .forEach processing will hold the read lock until 
all items have been processed. This is one of the reasons why the problems are 
rare with small number of items or low amount of write operations. However such 
a solution won't completely prevent deadlocks, if the model is such that it 
could lead to a deadlock that isn't detected or resolved after some timeout. A 
deadlock will eventually happen even if the probability is low. Currently the 
Pulsar broker locks up and doesn't resume operations if there's a deadlock. 
That's what makes the problem severe. Since the locks could be in specific 
sections of a ConcurrenOpenHashMap, it leads to partial unavailability which is 
hard to detect.
   
   While it's clear that a slow scan will provoke a spike in write latency into 
the map, I'm still not clear on the exact case of deadlock here. 
   
   Typically the way to get deadlocked on the map is if, while doing a scan, 
you're trying to access it again from a different thread (or even same thread, 
since the lock is non-reentrant) and that blocks the processing function.
   
   > I see now, this is a case of deadlock, we are issuing a blocking call to 
ZooKeeper into a forEach, and the ZooKeeper library handles every operation in 
only one single thread.
   So the chain is pretty hard do understand
   
   @eolivelli Even if do a ZK blocking call (which we shouldn't), that doesn't 
necessarily mean that there will be a deadlock. It will only be the case if the 
ZK callback thread is blocked trying to read/write on the same map. Do you have 
the other threads stacks?
   
   To conclude, in my opinion the easiest/lower-risk/lower-overhead change here 
would be to switch the `forEach()` to do the stamped+read-lock combination for 
each entry, instead of the current per-section. 
   
   That will ensure that the process function is always call without holding a 
lock on the map. It can lead to spurious inconsistencies during re-hashing 
time, but I don't think that will be a problem. And anyway that can be solved 
by requesting to behave like the current implementation for these specific 
cases.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to