Re: bounded memoize

Meikel Brandmeyer Wed, 10 Mar 2010 23:43:07 -0800

Hello Laurent,

On Mar 10, 11:45 am, Laurent PETIT <laurent.pe...@gmail.com> wrote:


>  * usage of refs : I had a bad feeling, and cgrand confirmed this to
> me by pointing an even more interesting counter-argument. Me: using
> refs is not mandatory since you do not need to synchronize this change
> with anything else.

I don't think, that this is entirely true! You have to syncronise the
cache with the state in the strategy. This can be done with atoms only
if the cache and the strategy state are contained in the same atom and
all updates happen in a single call to swap! (or reset!). As soon as
you
have multiple calls, you need transactions again, because the atom
might
change between the two calls. And you can't swap! and return a result
at
the same time.

Say you have the LRU strategy. You deref the cache, find the result
cached. Now you swap! in the new access time into your state. But
between the deref and the swap! another call might trigger the removal
of the entry. So you always have to guard against the atom suddenly
changing underneath - even if it's only one atom.

Something what annoys me more is that the computation may be fired of
several times. Since the f is supposedly expensive, I'd rather avoid
that.

>  * lookup function responsibilities:  I cannot (yet) offer a better
> way to approach the problem, but I have a bad feeling with the lookup
> function changing things "behind the scene".

Can you explain a little more on what you think is the problem? Things
like LRU need some bookkeeping on access. Otherwise one just does not
have the required information at hand and the strategy is not
possible.
So I think, you mean something else or you consider LRU a bad
strategy.
I have no experience with the different strategies, but I tend to
believe the first possibility.

If this is true you refer to the TTL strategy, where the lookup
function
is removing old entries. I also don't really like this. What are the
possibilities?

* removal in cache-update, may cause items to linger quite long in the
  cache depending on the call structure, possibly much longer than the
  TTL, still needs special handling in cache-lookup or out-of-date
  entries will be returned
* removal in cache-lookup, happens (possibly) much more often
* a dedicated thread, which is created when a new entry is done. Then
  sleeps for it's TTL and then wakes and removes the entry. This will
  however potentially create a lot of threads just sleeping around. I
  don't know how expensive this is.
* a cron like thread, which sleeps to the next removal time, removes
  the out-of-date item for that point, sleeps again, and so on. This
  would be only one thread, but a more difficult implementation and
  requires cross-thread syncronisation.

The cache-lookup provided the most bang for the buck if you want to
keep
it simple.

> Christophe: And by using refs, you synchronize the
> change with a potential uber STM transaction, and if this uber STM
> transaction retries, you will not benefit from the memoization, since
> the memoization itself will be discarded by the retry.

This is a problem the STM cannot solve. We would need some
dont-nest-me-dosync, which does not merge with a possible
surrounding dosync. (This was actually part of the famous
discussion between Rich and Cliff Click: How to define the
borders of a transaction region?)

I gave it another run and came up with another solution. Even more
hairy. :P

* a warm cache is fast
* a cache miss is potentially slow, but the missing value is computed
only
  once, multiple requests for the same item wait for the initially
triggered
  computation
* the cache can be updated (almost) in parallel for different items

(defn memoize
  ([f] (memoize f naive-strategy))
  ([f [cache-lookup cache-update]]
   (let [cache-state (atom {})]
     (fn [& args]
       (if-let [e (cache-lookup cache-state args)]
         @(val e)
         @(locking cache-state
            (if-let [e (cache-lookup cache-state args)]
              (val e)
              (let [result (promise)]
                (.start (Thread. #(deliver result (apply f args))))
                (swap! cache-state assoc-in [:cache args] result)
                (cache-update cache-state args)
                result))))))))

But with this you can't reliably implement strategies like LRU and the
like.

But this is really an interesting problem. I'll mull a little more
about it. :)

Sincerely
Meikel

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: bounded memoize

Reply via email to