[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134579#comment-13134579
 ] 

Vikas Mehta commented on ZOOKEEPER-1177:
----------------------------------------

Hi Patrick,

   Sorry for the late response. Let me know if you were looking for something 
different than what I answered:

1) in your testing what is the impact of triggering a large number of watches 
on overall operation latency?

[vikas] Without this change, with large number of watches, zookeeper would run 
out of memory storing all the watches. With this change, it is now bound by the 
network bandwidth. One difference between the current and the old 
implementation is that now triggerWatch() loops through all the watchers to 
find that watches it needs to trigger compared to using the reverse map in the 
previous version to prevent this scan for slight benefit in cases when number 
of watches per path is much smaller compared to number of watchers.

2) Say I delete a znode in your example, that will trigger 10k notifications to 
be sent (one to each session) - what is the impact on the latency of this 
request (the delete), both with and without this patch?

[vikas] Without the patch, as mentioned above we are not able to run zookeeper 
with so many watches. If we do not have too many watchers (or overall watches), 
impact with this change would be linear scan of the watchers to identify the 
watches that need to be triggered for the update/delete node operation.

3) Subsequent to the investigations you've been doing, should we have concerns 
on overall service availability due to large numbers of watches being triggered 
concurrently?

[vikas] We are thinking of implementing some throttling on the server (and 
later may be client-side as well) to prevent deterioration in the Zookeeper 
performance or availability.

Thanks,
Vikas
                
> Enabling a large number of watches for a large number of clients
> ----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1177
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1177
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Vishal Kathuria
>            Assignee: Vishal Kathuria
>             Fix For: 3.5.0
>
>         Attachments: ZooKeeper.patch
>
>
> In my ZooKeeper, I see watch manager consuming several GB of memory and I dug 
> a bit deeper.
> In the scenario I am testing, I have 10K clients connected to an observer. 
> There are about 20K znodes in ZooKeeper, each is about 1K - so about 20M data 
> in total.
> Each client fetches and puts watches on all the znodes. That is 200 million 
> watches.
> It seems a single watch takes about 100  bytes. I am currently at 14528037 
> watches and according to the yourkit profiler, WatchManager has 1.2 G 
> already. This is not going to work as it might end up needing 20G of RAM just 
> for the watches.
> So we need a more compact way of storing watches. Here are the possible 
> solutions.
> 1. Use a bitmap instead of the current hashmap. In this approach, each znode 
> would get a unique id when its gets created. For every session, we can keep 
> track of a bitmap that indicates the set of znodes this session is watching. 
> A bitmap, assuming a 100K znodes, would be 12K. For 10K sessions, we can keep 
> track of watches using 120M instead of 20G.
> 2. This second idea is based on the observation that clients watch znodes in 
> sets (for example all znodes under a folder). Multiple clients watch the same 
> set and the total number of sets is a couple of orders of magnitude smaller 
> than the total number of znodes. In my scenario, there are about 100 sets. So 
> instead of keeping track of watches at the znode level, keep track of it at 
> the set level. It may mean that get may also need to be implemented at the 
> set level. With this, we can save the watches in 100M.
> Are there any other suggestions of solutions?
> Thanks
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to