On 1/19/07, Jeff Rogers <[EMAIL PROTECTED]> wrote:
Stephen Deasey wrote:
> Hit rate is not necessarily the best measure of effectiveness. Perhaps
> it's more expensive to fill one cache than it is to fill another? A
> single, shared LRU seemed like the simplest solution. Ideas
> appreciated.
What if 'ns_cache eval' stored with the cached value an estimate of how
much work (i.e., time) it took to create the cache entry, and used that
as a hint for expiring or flushing values? An operation that is used
1/10 as often but takes 50x longer to compute may be worth keeping
around longer.
That might make sense. Save the x50 hit 1/10 of the time.
I guess you'd have to factor in the size. If the x50 hit took a lot
of cache space, it might not be worth it, compared to all those more
frequently used, but less expensive entries you could have stored.
How do you figure out where the line is?
Also, how would you sort the weighted LRU list?
And as long as I'm musing on caching, 2 situations dealing with
expensive cache calculations come to mind. One is where there is no
current value for an entry and several threads request it simultaneously
(or it is requested multiple times before the first request has time to
complete evaluation of the script). Do the second and subsequent
accesses start their own evaluation of the script, or do they block
until the first completes and then all return the single value? Without
digging into the code or running tests, my guess is that this works
correctly, since the key should be locked with a mutex and reads would
block the thread until the mutex is released.
We do the right thing.
There's a lock around the whole cache. If an entry value needs to be
computed, the entry is marked, the lock is released while the value is
computed and the lock is re-taken once the value is ready.
Another thread which takes the lock will discover the marked entry and
sleep waiting for the original thread to complete the cache fill.
A second situation is where there is an expired entry and multiple
threads access it at the same time. It is probably handled as above, as
if there was no entry. But it might be desirable in certain situations
to return a stale entry immediately and start a separate thread to run
the script.
Should work as above.
These solutions may be too clever for their own good and apply only to a
very few marginal cases.
Seems like the cache balancing problem is the kind of thing someone
way smarter than me has already solved. I had a google around but
could only find papers on optimising processor caches...