Hi Karol,
I'll have a look. We are a bit busy with the next release for the 3.5 branch 
(RC should go out this week), so don't worry if we don't say anything about 
them in the next few days. If we take too long, feel free to ping me on this 
list again.
-Flavio 


     On Monday, March 16, 2015 4:23 PM, "Dudzinski, Karol" 
<[email protected]> wrote:
   
 

 Hi Flavio,

I've created a JIRA for this: 
https://issues.apache.org/jira/browse/ZOOKEEPER-2141  I'll upload a patch to 
demonstrate the approach I was considering shortly.

While I was at it, I submitted a few other JIRAs for some issues we've hit.  
I'm happy to submit patches for all of them but would appreciate some comments 
from the committers about the approaches  or even the validity of what I'm 
suggesting.

The other JIRAs are:
https://issues.apache.org/jira/browse/ZOOKEEPER-2142
https://issues.apache.org/jira/browse/ZOOKEEPER-2143
https://issues.apache.org/jira/browse/ZOOKEEPER-2144

Thanks,
Karol

The Goldman Sachs Group, Inc. All rights reserved.
See http://www.gs.com/disclaimer/global_email for important risk disclosures, 
conflicts of interest and other terms and conditions relating to this e-mail 
and your reliance on information contained in it.  This message may contain 
confidential or privileged information.  If you are not the intended recipient, 
please advise us immediately and delete this message.  See 
http://www.gs.com/disclaimer/email for further information on confidentiality 
and the risks of non-secure electronic communication.  If you cannot access 
these links, please notify us by reply message and we will send the contents to 
you.

-----Original Message-----
From: Flavio Junqueira [mailto:[email protected]] 
Sent: 26 February 2015 22:53
To: [email protected]
Cc: [email protected]
Subject: Re: What goes in the snapshot?

Hi Karol,

The use of reference counters might be a good way around it. To make it 
backward compatible, I think we can optionally use the counters if the third 
map is present in the snapshot. Would it work?

I also think it would be good to create a jira for this so that we can track 
this discussion and propose patches.

-Flavio

> On 26 Feb 2015, at 13:13, Karol Dudzinski <[email protected]> wrote:
> 
> Hi Flavio,
> 
> We've done some more analysis using the snapshot formatter and a heap dump 
> and have found the source of the snapshot bloat.
> 
> What is taking  the majority of the space is the longKeyMap from DataTree.  
> In the heapdump, aclKeyMap has as many entries (which is to be expected given 
> how the maps are used) and is also taking an equally large amount of space 
> though at least aclKeyMap isn't serialised to the snapshot.
> 
> We use a custom authentication provider but because the 
> AuthenticationProvider.matches method does not provide the path being 
> operated on, we end up sticking the path in the ACL id.  Some of our apps end 
> up generating a lot of paths for one time use and consequently we end up with 
> lots of unique ACLs.
> 
> The two ACL maps in DataTree seem to be an optimisation so that repeated 
> usage of ACLs does not result in the full list being stored multiple times.  
> However, these two maps are never removed from so if an ACL is unique these 
> maps (and the snapshot) grow forever.
> 
> We're quite keen on fixing this as it's causing us lots of issues and we're 
> happy to provide a patch but will need your opinion on the various options:
> - create a third map which would be a reference count for the ACLs which can 
> be updated as needed when creating, deleting or setting ACL.  When the 
> reference count is 0, remove the entry from all the maps
> - use weak references in some shape or form though this is made harder by the 
> fact that ACL optimisation essentially needs a bidirectional index (hence the 
> two maps).  We've given this one lots of thought but it would really require 
> something like a ConcurrentWeakBiHashMap which just sounds wrong and over 
> engineered :)
> 
> The other fix that could be made is to pass the path being operated on to the 
> AuthenticationProvider.  However, doing that in a backwards compatible 
> fashion is not trivial and even though it would fix my problem (by allowing 
> me to remove the path from the ACL id) it wouldn't fix the general problem 
> with this optimisation.
> 
> Looking forward to hearing your thoughts on this.
> 
> Thanks,
> Karol
> 
>> On 22 Feb 2015, at 14:55, Flavio Junqueira <[email protected]> 
>> wrote:
>> 
>> Hi Karol,
>> 
>> It's odd that you have such large snapshots and little data in the data 
>> tree. Are you creating lots of sessions? Right now I can't think of a good 
>> reason, I suggest you really use the snapshot formatter to inspect the 
>> snapshot. 
>> 
>> -Flavio
>> 
>>> On 22 Feb 2015, at 14:23, Karol Dudzinski <[email protected]> wrote:
>>> 
>>> Hi Flavio,
>>> 
>>> Yes, one of ours clients had a bug which caused it to go into a 
>>> create/delete tight loop with zero net effect (I.e. It was deleting what it 
>>> had just created). After stopping the client, the snapshot never reduced in 
>>> size so are the deletes in there permanently?
>>> 
>>> Thanks,
>>> Karol
>>> 
>>> 
>>>> On 22 Feb 2015, at 14:05, Flavio Junqueira <[email protected]> 
>>>> wrote:
>>>> 
>>>> Hi there,
>>>> 
>>>> Perhaps a lot of data has been deleted? In any case, you may want to use 
>>>> the SnapshotFormatter to check what is in the large snapshot.
>>>> 
>>>> -Flavio
>>>> 
>>>>> On 22 Feb 2015, at 10:44, Karol Dudzinski <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I was under the impression that the snapshot contained essentially an 
>>>>> on-disk copy of all the data.  However, one of our clusters has a 
>>>>> snapshot which is over 1GB while the mntr four letter word reports an 
>>>>> approximate data size in the hundreds of KB and a node count in the low 
>>>>> thousands.  So what else goes into the snapshot and how can I slim it 
>>>>> down?
>>>>> 
>>>>> Thanks,
>>>>> Karol
>> 


 
  

Reply via email to