Re: Review Request 33623: Ensure deletion doesn't break coalesced hashing

Andrew Stitcher Tue, 28 Apr 2015 12:09:11 -0700


> On April 28, 2015, 2:33 p.m., Alan Conway wrote:
> > proton-c/src/object/map.c, line 322
> > <https://reviews.apache.org/r/33623/diff/2/?file=944061#file944061line322>
> >
> >     Just to see if I understand the problem:
> >     
> >     The old code moves the "next" entry into the current slot. This breaks 
> > if the next entry is in allocated space and is the head of a coalesced 
> > chain as the latter part of that chain is now under the wrong bucket.
> >     
> >     So a simple optimization is to keep the original solution if (next > 
> > allocated) That is a very cheap way to check if there is a coalesced chain 
> > to worry about.
> 
> Gordon Sim wrote:
>     Yes, I believe that is right, you can move the next entry as long as it 
> is outside the addressable region. I'll make (and test) that change.
> 
> Andrew Stitcher wrote:
>     I don't think that test is correct: nodes in the hash chain can be in 
> both the "addressable" and "non-addressable" region, it's just that you don't 
> use the addressable region unless there is no more space in the 
> "non-addressable" region. The scan for a free table entry starts at the last 
> entry but can reach all the way to the first table entry not just the first 
> entry in the non-addressable region.
> 
> Gordon Sim wrote:
>     I don't understand your point.
> 
> Andrew Stitcher wrote:
>     I'm trying to say that Alan's suggested optimisation is incorrect.
> 
> Gordon Sim wrote:
>     Can you elaborate as to why not? Above you state the nodes can be in the 
> addressable region as well as outside it, which is obvious, but you don't say 
> why it is incorrect to move an entry out of the non-addressable region. I 
> believe it is moving addressable entries that is the problem (as this is what 
> breaks the link to the hashing).
> 
> Andrew Stitcher wrote:
>     It's because the entries addessable and nonaddressable can be 
> interleaved. If all the nonaddressable entries came before any addressable 
> entry in the chain then the optimisation would be safe.
>     
>     But in the presence of deletion you could use up all the cellar entries; 
> create an entry in the addressable region; then have a cellar entry removed; 
> and then have that linked into the chain after the addressable entry. This 
> would mean that only checking the next entry is not sufficient to be sure 
> that no coalescing has happened further down the chain (so you need to walk 
> the entire chain anyway).
>     
>     If you can make it an invariant that *all* entries in the chain that are 
> addressable (hence possibly coalesced) after after *all* entries in the chain 
> that are in the cellar then this optimisation is safe. It is possible to 
> ensure this invariant, but the current code doesn't so it.
>     
>     Does this make sense, or did I misunderstand the optimisation suggestion?
> 
> Gordon Sim wrote:
>     The optimisation proposed would only affect the next entry, not any chain 
> following that. The test of whether the next entry is in the cellar merely 
> indicates whether that entry can be safely moved, not whether it is part of 
> any coalesced chain (which it well might be).
>     
>     So when Alan says it 'is a very cheap way to check if there is a 
> coalesced chain to worry about', I believe the 'to worry about' part is the 
> key. Its not that it is not a coalesced chain, its that moving the entry 
> doesn't affect the addressability of any entry within it.
>     
>     Do you agree?


Yes, I do agree. I think that as long as you avoid cutting a non-cellar entry 
out of the chain you can't break any coalesced chain.


- Andrew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33623/#review81814
-----------------------------------------------------------


On April 28, 2015, 6:35 p.m., Gordon Sim wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33623/
> -----------------------------------------------------------
> 
> (Updated April 28, 2015, 6:35 p.m.)
> 
> 
> Review request for qpid, Alan Conway, Andrew Stitcher, and Rafael Schloming.
> 
> 
> Bugs: PROTON-858
>     https://issues.apache.org/jira/browse/PROTON-858
> 
> 
> Repository: qpid-proton-git
> 
> 
> Description
> -------
> 
> The internal map implementation for proton-c uses coalesced hashing, but the 
> current deletion algorithm does not take account of this. On deletion of 
> entries, the map can lose its ability to locate other entries by key.
> 
> This patch is the simplest possible fix. It frees the deleted entry, makes 
> any previous entry that points to this tha tail of its chain, and then 
> relocates any entries in a chain following the deleted entry. Though I 
> believe this to be correct and safe, it does involve more work on some 
> deletions. The approach can be optimised further, either be reducing the 
> amount of chain that needs relocated, or by marking records as deleted in the 
> first instance and only rehashing later on some criteria.
> 
> (I started off trying to do the latter, but it is more complicated and I 
> still haven't got my patch fully correct. When I get that working I'll post 
> that also. It is a more involved change though, so perhaps we may want to 
> consider this simpler approach for 0.9.1).
> 
> 
> Diffs
> -----
> 
>   proton-c/src/object/map.c 1a758a1 
>   proton-c/src/tests/object.c 2b213c5 
> 
> Diff: https://reviews.apache.org/r/33623/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Gordon Sim
> 
>

Re: Review Request 33623: Ensure deletion doesn't break coalesced hashing

Reply via email to