> Perrin Harkins writes:
> > To fix this, we moved to not generating anything until it was requested.
We
> > would fetch the data the first time it was asked for, and then cache it
for
> > future requests.  (I think this corresponds to your option 2.)  Of
course
> > then you have to decide on a cache consistency approach for keeping that
> > data fresh.  We used a simple TTL approach because it was fast and easy
to
> > implement ("good enough").
>
> I'd be curious to know the cache hit stats.

In this case, there was a high locality of access, so we got about a 99% hit
rate.  Obviously not every cache will be this successful.

> BTW, this case seems to
> be an example of immutable data, which is definitely worth caching if
> performance dictates.

It wasn't immutable, but it was data that we could allow to be out of sync
for a certain amount of time that was dictated by the business requirements.
When you dig into it, most sites have a lot of data that can be out of sync
for some period.

> I agree with latter clause, but take issue with the former.  Typical
> sites get a few hits a second at peak times.  If a site isn't
> returning "typical" pages in under a second using mod_perl, it
> probably has some type of basic problem imo.

Some sites have complex requirements.  eToys may have been an anomaly
because of the amount of traffic, but the thing that forced us to cache was
database performance.  Tuning the perl stuff was not very hard, and it was
all pretty fast to begin with.  Tuning the database access hit a wall when
our DBAs had gone over the queries, indexes had been adjusted, and some
things were still slow.  The nature of the site design (lots of related data
on a single page) required many database calls and some of them were fairly
heavy SQL.  Some people would say to denormalize the database at that point,
but that's really just another form of caching.

> Use a profiler on the actual code.

Agreed.

> Add
> performance stats in your code.  For example, we encapsulate all DBI
> accesses and accumulate the time spent in DBI on any request.

No need to do that yourself.  Just use DBIx::Profile to find the hairy
queries.

> Adding a cache is piling more code onto a solution.  It sometimes is
> like adding lots of salt to bad cooking.  You do it when you have to,
> but you end up paying for it later.

It may seem like the wrong direction to add code in order to make things go
faster, but you have to consider the relative speeds: Perl code is really
fast, databases are often slower than we want them to be.

Ironically, I am quoted in Philip Greenspun's book on web publishing saying
just what you are saying: that databases should be fast enough without
middle-tier caching.  Sadly, sometimes they just aren't.

- Perrin

Reply via email to