Re: Review Request 33623: Ensure deletion doesn't break coalesced hashing

Gordon Sim Tue, 28 Apr 2015 11:50:25 -0700


> On April 28, 2015, 2:33 p.m., Alan Conway wrote:
> > proton-c/src/object/map.c, line 322
> > <https://reviews.apache.org/r/33623/diff/2/?file=944061#file944061line322>
> >
> >     Just to see if I understand the problem:
> >     
> >     The old code moves the "next" entry into the current slot. This breaks 
> > if the next entry is in allocated space and is the head of a coalesced 
> > chain as the latter part of that chain is now under the wrong bucket.
> >     
> >     So a simple optimization is to keep the original solution if (next > 
> > allocated) That is a very cheap way to check if there is a coalesced chain 
> > to worry about.
> 
> Gordon Sim wrote:
>     Yes, I believe that is right, you can move the next entry as long as it 
> is outside the addressable region. I'll make (and test) that change.
> 
> Andrew Stitcher wrote:
>     I don't think that test is correct: nodes in the hash chain can be in 
> both the "addressable" and "non-addressable" region, it's just that you don't 
> use the addressable region unless there is no more space in the 
> "non-addressable" region. The scan for a free table entry starts at the last 
> entry but can reach all the way to the first table entry not just the first 
> entry in the non-addressable region.
> 
> Gordon Sim wrote:
>     I don't understand your point.
> 
> Andrew Stitcher wrote:
>     I'm trying to say that Alan's suggested optimisation is incorrect.
> 
> Gordon Sim wrote:
>     Can you elaborate as to why not? Above you state the nodes can be in the 
> addressable region as well as outside it, which is obvious, but you don't say 
> why it is incorrect to move an entry out of the non-addressable region. I 
> believe it is moving addressable entries that is the problem (as this is what 
> breaks the link to the hashing).
> 
> Andrew Stitcher wrote:
>     It's because the entries addessable and nonaddressable can be 
> interleaved. If all the nonaddressable entries came before any addressable 
> entry in the chain then the optimisation would be safe.
>     
>     But in the presence of deletion you could use up all the cellar entries; 
> create an entry in the addressable region; then have a cellar entry removed; 
> and then have that linked into the chain after the addressable entry. This 
> would mean that only checking the next entry is not sufficient to be sure 
> that no coalescing has happened further down the chain (so you need to walk 
> the entire chain anyway).
>     
>     If you can make it an invariant that *all* entries in the chain that are 
> addressable (hence possibly coalesced) after after *all* entries in the chain 
> that are in the cellar then this optimisation is safe. It is possible to 
> ensure this invariant, but the current code doesn't so it.
>     
>     Does this make sense, or did I misunderstand the optimisation suggestion?


The optimisation proposed would only affect the next entry, not any chain 
following that. The test of whether the next entry is in the cellar merely 
indicates whether that entry can be safely moved, not whether it is part of any 
coalesced chain (which it well might be).

So when Alan says it 'is a very cheap way to check if there is a coalesced 
chain to worry about', I believe the 'to worry about' part is the key. Its not 
that it is not a coalesced chain, its that moving the entry doesn't affect the 
addressability of any entry within it.

Do you agree?


- Gordon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33623/#review81814
-----------------------------------------------------------


On April 28, 2015, 6:35 p.m., Gordon Sim wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33623/
> -----------------------------------------------------------
> 
> (Updated April 28, 2015, 6:35 p.m.)
> 
> 
> Review request for qpid, Alan Conway, Andrew Stitcher, and Rafael Schloming.
> 
> 
> Bugs: PROTON-858
>     https://issues.apache.org/jira/browse/PROTON-858
> 
> 
> Repository: qpid-proton-git
> 
> 
> Description
> -------
> 
> The internal map implementation for proton-c uses coalesced hashing, but the 
> current deletion algorithm does not take account of this. On deletion of 
> entries, the map can lose its ability to locate other entries by key.
> 
> This patch is the simplest possible fix. It frees the deleted entry, makes 
> any previous entry that points to this tha tail of its chain, and then 
> relocates any entries in a chain following the deleted entry. Though I 
> believe this to be correct and safe, it does involve more work on some 
> deletions. The approach can be optimised further, either be reducing the 
> amount of chain that needs relocated, or by marking records as deleted in the 
> first instance and only rehashing later on some criteria.
> 
> (I started off trying to do the latter, but it is more complicated and I 
> still haven't got my patch fully correct. When I get that working I'll post 
> that also. It is a more involved change though, so perhaps we may want to 
> consider this simpler approach for 0.9.1).
> 
> 
> Diffs
> -----
> 
>   proton-c/src/object/map.c 1a758a1 
>   proton-c/src/tests/object.c 2b213c5 
> 
> Diff: https://reviews.apache.org/r/33623/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Gordon Sim
> 
>

Re: Review Request 33623: Ensure deletion doesn't break coalesced hashing

Reply via email to