hanm commented on issue #1098: ZOOKEEPER-3560: Add response cache to serve get 
children (2) requests.
URL: https://github.com/apache/zookeeper/pull/1098#issuecomment-547210748
 
 
   >> serialization is a cpu-cost operation? 
   
   Serialization is not just CPU bound; it's also memory bound. Most JVM uses a 
generational garbage collector where it treats objects on heap differently 
based on their age. The response objects (and the reachable objects like 
buffers) are very short lived, the serialization code path will create many 
such short lived objects through allocations from Eden space. When QPS is high, 
the Eden (and survivor) spaces will be filled up much faster, forced JVM to 
promote these short lived objects to the older generation at a young age than 
the normal tenuring threshold, which could lead to excessive fragmentation of 
the old generation, and this creates all sorts of problems for the actual GC 
(CMS, for example), such as increased GC time and sometimes full GC.
   
   >> Putting more thing into the cache(memory) will hava more likely to 
trigger the full GC(STW)?
   
   Yes, if not choosing properly, caching would lead to more memory pressure as 
this cache feature essentially reduces the available heap space. But, when 
choosing properly, this feature will benefit both request serving, and improve 
GC.
   
   >> I also doubt about: adding this cache will bring about some issues about 
read inconsistency
   
   We invalidate cache when the cached data (zNode) changed; I don't see a 
particular consistency issue here. The sequential read consistency is still 
guaranteed. If you think there is a read inconsistency issue, do you mind 
providing a concrete example to illustrate the problem?
   
   >> I also do a benchmark for this feature
   
   Nice work, thanks for spending time doing that. 
   
   I want to point out that writing a benchmark for ZooKeeper (and in general, 
for any non trivial systems) is hard - as there are too many variables in the 
equation. For your benchmark, did you measure other metrics (request latency, 
gc/msec/cycles) with more types of workload (e.g. more read heavy - 1k per sec 
is nothing for ZK, try 50k or 100k) with different sizes of cache (at least, we 
should measure 400, which is the default value, instead of 1000)... and this 
list can go on and on. Of course, we have to respect the existing insights the 
(limited) benchmark provides, but it would be premature to conclude how the 
feature would perform without benchmark all dimensions extensively. 
   
   An extensive benchmark could provide a great deal of insights on performance 
characteristics of a system under different workloads and configurations, but 
the real test is on production environment. This caching feature works great 
for the clusters I operate and I expect similar success stories from @enixon 
who contributed the original feature.
   
   >> it should be enable by default?
   
   I'd like to have the feature enable by default based on the observations and 
experience of running this feature on some of our production cluster with 
default value with a variety of workloads. Disabling such a feature will be 
sad, considering lots of ZooKeeper users might not even aware of such a feature 
due to the monstrous amount of configuration options of ZooKeeper. That said, I 
am flexible on disabling the caching feature by default if community consent 
so, in which case I'd propose that being done in a different JIRA consider this 
pull request is ready to land.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to