On Fri, May 15, 2015 at 7:25 PM, Philip Martin <philip.mar...@wandisco.com> wrote:
> Philip Martin <philip.mar...@wandisco.com> writes: > > > Philip Martin <philip.mar...@wandisco.com> writes: > > > >> Prompted by the warnings I think there are some issues to fix. For > >> APR_HASH_KEY_STRING keys there is no protection against abnormally long > >> keys. combined_long_key() will allocate strlen() memory even if it is > >> many GB. The item will not get cached if key+data length is more than > >> 4GB but the memory for the key, which could be more than 4GB, will be > >> permanently allocated in the cache. There is also a problem with > >> overflow in membuffer_cache_set_internal() when calculating key+data > >> length, although in practice a key large enough to trigger this will > >> probably fail memory allocation first. > > > > Another issue: > > > > static svn_boolean_t > > entry_keys_match(const entry_key_t *lhs, > > const entry_key_t *rhs) > > { > > return (lhs->fingerprint[0] == rhs->fingerprint[0]) > > && (lhs->fingerprint[1] == rhs->fingerprint[1]) > > && (lhs->key_len == rhs->key_len); > > } > > > > I think the key_len comparison is wrong and should be removed. If two > > keys have fingerprints that collide it does not mean the key lengths are > > the same. When attempting to use two keys with colliding fingerprints > > the behaviour when the key lengths vary should be same as when the key > > lengths are the same and the key data varies. > Not necessary. It is true that we could decide to *only* use the fingerprint as discriminator. But the implementation here includes the key length; it reduces the number of conflicts that result in entries that evict each other. Most fixed-length keys are 16 bytes long and for them, the fingerprint is almost identical to the key and conflicts are extremely unlikely. So, varible-length keys are the one that may have fingerprint conflicts now due to the weakness of FNV1. For them, however, the key_len (often derived from paths) has a good chance of being different. Note that if the key_len differs, it *cannot* be the same key. Hence, including the key_len does not create false negatives. > Another issue: find_entry() now calls drop_entry() in more cases and can > now call it when find_empty==FALSE during read operations. On Unix when > using the read-write lock this means the cache gets modified while only > holding a read lock, not a write lock, and that can corrupt the cache. > > If we use read-write locks then when read detects a fingerprint > collision it cannot modify the cache to clear the collision, it will > have to return NULL and leave the entry in the cache. > You are absolutely right. Fixed in r1679681. -- Stefan^2.