li4wang commented on PR #2086:
URL: https://github.com/apache/zookeeper/pull/2086#issuecomment-1853151200

   We also looked into how to enable Prometheus metrics in production and did 
quite a lot perf tests and profiling recently. The 
   
   1. The metrics queue size is 1M by default it can be tuned. 1M of queue size 
seems too large. We reduced the queue size from 1M to 100K, the max GC pause 
was reduced 78% and the  GC counts was reduced 80%
   
   2. We also noticed that when the thread pool queue is full, a large number 
of RejectedExecutionException instances was created, which added more GC 
overhead.  This is because `ThreadPoolExecutor` uses `AbortPolicy` as the 
`RejectedExecutionHandler`. AbortPolicy instantiates RejectedExecutionException 
object and makes two quite involved `toString` calls.
   
   ```
           public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
               throw new RejectedExecutionException("Task " + r.toString() +
                                                    " rejected from " +
                                                    e.toString());
           }
   ```
   
   3. We created patch that uses the `DiscardPolicy` instead of `AbortPolicy`, 
which silently drop the rejected task instead of throwing 
`RejectedExecutionException`. With the patch, the max GC paused was reduced 
further about 7% and GC counts was reduced about 61%  for 100K queue size.  As 
a result, the latency of read operation was reduced  59% and throughput 
increased 140% .
   
   
   
   2. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to