On 16 May 2015 at 07:48, Stefan Fuhrmann <stefan.fuhrm...@wandisco.com> wrote:
> On Fri, May 15, 2015 at 7:25 PM, Philip Martin <philip.mar...@wandisco.com>
> wrote:
>>
>> Philip Martin <philip.mar...@wandisco.com> writes:
>>
>> > Philip Martin <philip.mar...@wandisco.com> writes:
>> >
>> >> Prompted by the warnings I think there are some issues to fix.  For
>> >> APR_HASH_KEY_STRING keys there is no protection against abnormally long
>> >> keys.  combined_long_key() will allocate strlen() memory even if it is
>> >> many GB.  The item will not get cached if key+data length is more than
>> >> 4GB but the memory for the key, which could be more than 4GB, will be
>> >> permanently allocated in the cache.  There is also a problem with
>> >> overflow in membuffer_cache_set_internal() when calculating key+data
>> >> length, although in practice a key large enough to trigger this will
>> >> probably fail memory allocation first.
>> >
>> > Another issue:
>> >
>> > static svn_boolean_t
>> > entry_keys_match(const entry_key_t *lhs,
>> >                  const entry_key_t *rhs)
>> > {
>> >   return (lhs->fingerprint[0] == rhs->fingerprint[0])
>> >       && (lhs->fingerprint[1] == rhs->fingerprint[1])
>> >       && (lhs->key_len == rhs->key_len);
>> > }
>> >
>> > I think the key_len comparison is wrong and should be removed.  If two
>> > keys have fingerprints that collide it does not mean the key lengths are
>> > the same.  When attempting to use two keys with colliding fingerprints
>> > the behaviour when the key lengths vary should be same as when the key
>> > lengths are the same and the key data varies.
>
>
> Not necessary. It is true that we could decide to *only* use
> the fingerprint as discriminator. But the implementation here
> includes the key length; it reduces the number of conflicts
> that result in entries that evict each other.
>
> Most fixed-length keys are 16 bytes long and for them, the
> fingerprint is almost identical to the key and conflicts are
> extremely unlikely. So, varible-length keys are the one that
> may have fingerprint conflicts now due to the weakness of
> FNV1. For them, however, the key_len (often derived from
> paths) has a good chance of being different.
>
> Note that if the key_len differs, it *cannot* be the same key.
> Hence, including the key_len does not create false negatives.
>
+1.

>> Another issue: find_entry() now calls drop_entry() in more cases and can
>> now call it when find_empty==FALSE during read operations.  On Unix when
>> using the read-write lock this means the cache gets modified while only
>> holding a read lock, not a write lock, and that can corrupt the cache.
>>
As far I understand find_entry(find_empty == TRUE) requires write-lock
for cache, while only read-lock is required for find_empty == FALSE.
Is correct? It would be nice to document this in docstring.

>> If we use read-write locks then when read detects a fingerprint
>> collision it cannot modify the cache to clear the collision, it will
>> have to return NULL and leave the entry in the cache.
>
>
> You are absolutely right. Fixed in r1679681.
>

PS: Just to make things clear: branch has been changed so my +1
doesn't apply to it.

-- 
Ivan Zhakov

Reply via email to