Re: [PATCH] Fix O(n) runtime in cache lookup, part 1/2

Stefan Fuhrmann Thu, 15 Apr 2010 18:05:37 -0700

Hi Philip,

thanks for your review!


Philip Martin wrote:

Stefan Fuhrman <[email protected]> writes:

svn_cache_t lookup returns a copy of the value found.
For objects of variable size, this can be expensive. If you
checkout / export all N files of a specific directory, for
instance, every single path gets verified. The directory
information is cached for FSFS but the copy operation
will still be O(N).

As a result, the whole check takes O(N2). The good


That sounds bad.  Do you have any measurements to show how much
improvement your patch gives?

Well, choose your N ;) For low square mean numbers of files
per folder as in TSVN and SVN, the gains were between 3
and 4 percent of total "svn export" runtime. If you want me to,
I could run a test with 10k files in a folder (probably a realistic
upper limit).

news is that we don't need the information for too long.
In fact, copying it takes longer than the usage afterwards.

This patch extends the cache interface by a "get and pin"
operation that returns a reference to the cached object
instead of a copy. As long as an entry is pinned the reference
will remain valid. The caller has to unpin it explicitly.


What happens if the caller fails to unpin?

Looking at the code pinned leaves the cache locked until unpin.  That
makes failure to unpin an unrecoverable error.

Rationale see below.

[...]

Index: subversion/libsvn_fs_fs/fs_fs.c
===================================================================
--- subversion/libsvn_fs_fs/fs_fs.c    (revision 930220)
+++ subversion/libsvn_fs_fs/fs_fs.c    (working copy)
@@ -1822,15 +1822,19 @@
   apr_array_header_t *manifest;
   apr_pool_t *iterpool;
+ /* We can operate on the cached object directly since we only need it+ * in this local context. Also, we just want to access a singleelement + * of that array. Copying the whole thing would be expensive.
+   */
   shard = rev / ffd->max_files_per_dir;
-  SVN_ERR(svn_cache__get((void **) &manifest, &is_cached,
-                         ffd->packed_offset_cache, &shard, pool));
+  SVN_ERR(svn_cache__get_pinned((void **) &manifest, &is_cached,
+ ffd->packed_offset_cache, &shard,pool));if (is_cached)
     {
*rev_offset = APR_ARRAY_IDX(manifest, rev %ffd->max_files_per_dir,
                                   apr_off_t);


Adding

         SVN_ERR(svn_foo(...));

would now be a bug because an error would cause the unpin to be
skipped.  This is an interface that is easy to get wrong, how about a
calback interface similar to svn_wc__call_with_write_lock?

The implementation is supposed not pin nor lock upon
lookup failure. Perhaps, I should make that clear in the
API documentation.

-      return SVN_NO_ERROR;

+ return svn_cache__unpin((void*)manifest,ffd->packed_offset_cache, pool);

/* Open the manifest file. */

Index: subversion/libsvn_subr/cache-inprocess.c
===================================================================
--- subversion/libsvn_subr/cache-inprocess.c    (revision 930220)
+++ subversion/libsvn_subr/cache-inprocess.c    (working copy)
@@ -66,6 +66,13 @@
    */
   apr_pool_t *cache_pool;

+ /* Reference to the item returned by inprocess_cache_get_pinned.

+   * Currently, we support at most one item to be pinned at any given
+   * time. While it is pinned, the mutex will be locked as will.
+   * If NULL, no item has been pinned.
+   */
+  struct cache_entry *pinned_entry;
+
 #if APR_HAS_THREADS
   /* A lock for intra-process synchronization to the cache, or NULL if
    * the cache's creator doesn't feel the cache needs to be
@@ -205,31 +212,38 @@
   return err;
 }

+/* Implement get in terms of get_pinned, copy an unpin.

+ * The overhead of doing so is marginal.
+ *
+ * This is also an example of how client code should use the
+ * pin / unpin functionality.
+ */
 static svn_error_t *
+inprocess_cache_get_pinned(void **value_p,
+                           svn_boolean_t *found,
+                           void *cache_void,
+                           const void *key,
+                           apr_pool_t *pool);
+
+static svn_error_t *
+inprocess_cache_unpin(const void *value_p,
+                      void *cache_void,
+                      apr_pool_t *pool);
+
+static svn_error_t *
 inprocess_cache_get(void **value_p,
                     svn_boolean_t *found,
                     void *cache_void,
                     const void *key,
                     apr_pool_t *pool)
 {
-  inprocess_cache_t *cache = cache_void;
-  struct cache_entry *entry;
-  svn_error_t *err;
+  void* temp;

+ SVN_ERR(inprocess_cache_get_pinned (&temp, found, cache_void, key,pool));

+  if (!*found)
+    return SVN_NO_ERROR;

- SVN_ERR(lock_cache(cache));

-
-  entry = apr_hash_get(cache->hash, key, cache->klen);
-  if (! entry)
-    {
-      *found = FALSE;
-      return unlock_cache(cache, SVN_NO_ERROR);
-    }
-
-  move_page_to_front(cache, entry->page);
-
-  *found = TRUE;
-  err = duplicate_value(value_p, cache, entry->value, pool);
-  return unlock_cache(cache, err);
+  SVN_ERR(duplicate_value(value_p, cache_void, temp, pool));
+  return inprocess_cache_unpin(temp, cache_void, pool);
 }

/* Removes PAGE from the LRU list, removes all of its entries from

@@ -275,6 +289,14 @@

SVN_ERR(lock_cache(cache));+ /* We got the mutex. Thus, other threads cannot have pinned an item.+ * But the current thread may try to modify the cache beforeunpinning

+   * the pinned item.
+   */
+  if (cache->pinned_entry)
+    return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL,

+ _("Cannot modify the cache while anobject is pinned."));


It's undefined behaviour for a thread to lock a POSIX mutex that is
already locked in the same thread.  Are you relying on the code above,
or is it really debugging code?  Perhaps use SVN_ERR_ASSERT?  You need
to make the mutex recursive if this is to be reliable but I don't know
if all platforms support recursive mutexes.

That is debugging code, so SVN_ERR_ASSERT is certainly
a better choice. My whole "try a simple implementation" approach
may not be suitable here (see below).

   existing_entry = apr_hash_get(cache->hash, key, cache->klen);

/* Is it already here, but we can do the one-item-per-page

@@ -403,16 +425,91 @@
   b.user_baton = user_baton;

SVN_ERR(lock_cache(cache));

+
+  /* We got the mutex. Thus, other threads cannot have pinned an item.

+ * But the current thread may try to access the cache beforeunpinning+ * the pinned item. + * While this restriction is not strictlynecessary here, we may want + * to enforce the pin-evaluate-unpinpattern regardless of the threading + * model, for instance.Otherwise, we may end up with Heisenbugs etc.

+   */
+  if (cache->pinned_entry)
+    return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL,

+ _("Cannot read the cache while an objectis still pinned."));

+
   return unlock_cache(cache,

svn_iter_apr_hash(completed, cache->hash,iter_cb, &b,

                                         pool));

}+static svn_error_t *

+inprocess_cache_get_pinned(void **value_p,
+                           svn_boolean_t *found,
+                           void *cache_void,
+                           const void *key,
+                           apr_pool_t *pool)
+{
+  inprocess_cache_t *cache = cache_void;
+  struct cache_entry *entry;
+
+  SVN_ERR(lock_cache(cache));
+
+  /* We got the mutex. Thus, other threads cannot have pinned an item.

+ * But the current thread may try to pin two entries or the sameentry + * twice.

+   */
+  if (cache->pinned_entry)
+    return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL,
+                            _("Cannot pin more than one object."));
+
+  /* Lookup. */
+  entry = apr_hash_get(cache->hash, key, cache->klen);
+  if (! entry)
+    {
+      *found = FALSE;
+      return unlock_cache(cache, SVN_NO_ERROR);
+    }
+
+  /* LRU mechanism. */
+  move_page_to_front(cache, entry->page);
+
+  /* Mark / pin the entry found and return the reference */
+  *found = TRUE;
+  cache->pinned_entry = entry;
+  *value_p = entry->value;
+
+  /* Cache remains locked until *unpin has been called. */
+  return SVN_NO_ERROR;
+}


So pinning leaves the cache locked and failing to unpin is essentially
an unrecoverable error.

Yes. Even if the locking would not be a problem. Pinning
is equivalent to locking a resource (the entry in that case).
Memory, for instance, will not be reclaimed until you destroy
the whole cache. This is the very purpose of this functionality.

I think your claim that the interface allows more than one item to be
pinned is also suspect.  How would the locking work?  Could two
threads each pin an item?

The current implementation is more simplistic than what
the interface allows. Locking the cache is a by-product
of that simplistic implementation.

Do you really need to leave the cache locked?  Would it be enough to
arrange for the item not to be evicted from the LRU cache?  Move the
pinned item from the LRU cache to a pinned cache, and move it back
when unpinned?

An alternative implementation would associate every
cache entry with a pin count. While the latter is >0,
the respective entry will not be evicted from cache.
The basic meaning of "pin" is: 'keep this reference
valid as long as it is pinned'.

The downside is that you may be tempted to pin many
items or to pin them for a longer time, possibly forgetting
to release these items in certain edge-cases.

That is the reason why I chose to pin at most one item
at any given time. Mistakes in cache usage would become
obvious pretty quickly. On the downside, multi-threaded
code needs additional coordination to make that "only 1"
limit feasible. Thus, the lock.

If your assessment is that "misusing" the lock is more
risky than the risk of a temporary memory leak, I can
change the implementation this weekend and post a
new patch.

-- Stefan^2.

Re: [PATCH] Fix O(n) runtime in cache lookup, part 1/2

Reply via email to