Re: SELECT cacheing

Tim Sweetman Fri, 08 Sep 2000 07:47:43 -0700
DeWitt - this started as a reply to the modperl mailing list, & I had a
look at File::Cache as my reply grew. See the end of this for the
relevant bit :) - think I've found a bug...

Drew Taylor wrote:
> 
> Roger Espel Llima wrote:
> >
> > I've written a very small module to cache SELECT results from DBI
> > requests.  The interface looks like:
> >
> >   use SelectCache;
> >
> >   my $db = whatever::get_a_handle();
> >   my $st = qq{ select this, that ... };
> >   my $rows = SelectCache::select($db, $st, 180);
> >
> > this returns an arrayref of rows (like the selectall_arrayref function),
> > and caches the result in a file, which gets reused for 180 seconds
> > instead of asking the db again.

"Storable" is probably a good way to store this sort of result.

> > The names of the cache files are the md5's of the select statement,
> > using the last hex digit as a subdirectory name.  There's no file
> > cleanup function; you can always do that from cron with find.
> >
> > This is all very simple, but it's pretty useful in combination with
> > mod_perl, to speed up things like showing the "latest 10 posts", on
> > frequently accessed webpages.

> > The question now is: is there any interest in releasing this?  I could
> > write some minimal docs and give it a 'proper' module name, if there's
> > interest.

This can be an extremely powerful approach to speeding up web
applications. We use a similar module which ended up fairly large - it
takes a method name & arguments, rather than an SQL string, meaning that
you can cache the result of operations other than SQL queries. It's also
grown several other enhancements: a mutual-exclusion-and-backoff
algorithm, so if one process is looking for the answer, others wait for
it rather than performing the same query at the same time, and several
ways to expire results that have become outdated (specifying lifetime,
or via timestamp files that get touched when major changes happen)

I always thought it'd make a good thing to CPANify but never got round
to it :(

The one thing I'd advise is: BE VERY CAREFUL WITH RACE CONDITIONS. You
can easily end up with something that will, in an unusual case, store
garbled data. I think you'd need to either use flock(), or write to
files then rename them, since rename is an atomic operation - and I
don't know how well that works under OSs other than UNIXes.

> I'm certainly interested. One question though - in the module do you
> blindly use the cache? I ask because in my instance I display the
> contents of a shopping cart on every page.

I think this would be tricky to use with a cache - cart contents will
change in real time, and there's one copy per user, so you'd need a way
of expiring the cached data according to user ID. 

Some RDBMSs get Large performance improvements from using placeholders
("select * from foo where userid = ?") and cacheing the statement
handles - I don't know if this applies to MySQL. With your sort of
application I'd try those measures before trying to use a complex cache
mechanism. Where up-to-date results are not critical, a cache mechanism
has great merit, IMHO.

Reading back along this thread,

Perrin Hawkins wrote:
> - Use the DBIx namespace for the module. 

Possibly. SQL is not the only application for this sort of tool, though
it seems to be the main one.

> - If possible, use some existing cache module for the storage, like
> Apache::Session or one of the m/Cache/ modules on CPAN.

IIRC, Apache::Session *generates* its own key for each session. This
isn't going to work with a MD5-keyed-cache, where the key is generated
from the SQL.

File::Cache seems to do something rather similar, though without the MD5
bit. However, from a cursory look at the code, I think it's vulnerable
to concurrency conditions such as:
+ process (a) reads a file whilst (b) is still writing it
+ processes (a) and (b) both write to a file simultaneously, possibly
corrupting it?!
  (this may be impossible, not sure)
+ process fails whilst writing a file (eg. process catches a KILL);
subsequent reads of that file
  will get fatal error

... which will pop up only Sometimes, usually on a busy site open to the
public :) This is Not Nice, assuming it's true.

Many CPAN things that do this sort of thing use tied hashes, which
(mostly, at least) won't work in a multi-process environment because
they don't handle concurrent reads & writes.

Cheers

--
Tim Sweetman
A L Digital
Re: SELECT cacheing

Reply via email to