Re: OT HELP/ADVICE: Experiences with db's & serving lots of pages

Brat Wizard Tue, 22 Jan 2002 11:53:04 -0800

Thanks for taking the time to reply Ray- I like using up simple ideas
first too. I'm just worried that this project is gonna get big quick and I
need to have the next several levels of "studliness" rolling around in the
back of my head now I think ;)

As you rightly guessed- after many trials using a number of techniques,
static pages were the one that moved all of the processing overhead
"someplace else" and leaving pure html "chunks" in its place. I'm not
quite to the point though of being able to serve it all up statically. I
have some other processing that needs to be done but Apache::ASP handles
that admirably.

(BTW- just in case anybody here wasn't already sure...
Apache::ASP  _ROCKS_ !! :)

What are your experiences with squid? Have you used it before? What sort
of improvement have you typically seen by using it?

As far as cache timeouts- I have a "clever" way of handling that. Each
category's listings are stored in its own subdir. I simply regen the
listings in a "new" dir and when they are done, slide the current dir to
an "old" dir and the "new" dir as the "cur" dir. That way any file handles
that are open will stay open and will certainly be concluded by the next
round of processing. The only load is inodes and I think I have enough of
those.

My real concern with this method is the constant pounding the drives take
in the process. Its all asynch system calls (read: can block) and I have
really wanted to know, all things being equal- which is faster, assembling
data from a file on disk or retrieving data via a networked database
(assuming a fast enough backbone). I've never seen a well-crafted study of
this scenario and no opinions I trust on the subject. One of these days
I reckon I'll have to model it and find out- but it would be nice to find
someone who already has an opionion.

So the next question all this leads to is what is the best "search"
scenario for a similar setup? Currently I'm just searching the database
raw. I already know this will grind to a halt quickly. My first plan is to
implement a quick and dirty inverted index. That should take me a ways
down the road- how far I'm not sure since this will be my first foray into
such things. I'm wondering how/where to improve efficiencies here and I
find myself scratching my head quite a bit. Obviously there has to be some
limit on returned tuples. However, all the tuples are selected even if
they are culled out by a 'limit' later. So this part is hard to avoid.
Building caches on the fly (say using the user's session id as a cache
identifier) is my next thought but this can get out of hand quickly too-
but overall seems to be the best way to go as it only has to be done once
and can be reaped later by the OnSessionEnd() event.

Any ideas or suggestions on searching? Is this a variation of the same
problem or worse because its asynchronous?

John

Ray Cote wrote:

> >At the moment, rebuilding the static caches is not a big issue- it
> >doesn't take long and its possible to get it all rebuilt in less than
> >half an hour. I'm concerned though what the next step will be when the
> >number of listings grows to the point that half and hour (or an hour)
>
> Well, nothing is going to beat static caches -- particularly if you
> put a caching server (such as Squid) in front of it.
>
> One thing you may want to consider is setting your cache timeouts
> fairly long (longer than it takes to generate a single category).
> Then, update a category and clear the cache. While you're generating,
> you will not be serving any of the new pages since the cache won't be
> looking yet.
>
> Also, simply moving the cache to a separate box may be sufficient to
> get next bump of speed. Again, the cache is what is actually
> 'serving' and not your web server with the static pages. Would be
> interested to see how much of the processor you get to use during
> page generation vs what your Web server is taking.
>
> Along similar lines, you should be able to free up significant disk
> time (and head seeking) by offloading the cache.
>
> Just some initial thoughts on your request. Obviously, there's some
> more complicated approaches you could take, but I like to run out of
> simple ideas first. :}
>
> Ray
> --
> -----------------------------------------------------------------
> Raymond Cote, President                 Appropriate Solutions, Inc.
> www.AppropriateSolutions.com       [EMAIL PROTECTED]
> 603.924.6079(v)  POB 458, Peterborough, NH 03458    603.924.8668(f)

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: OT HELP/ADVICE: Experiences with db's & serving lots of pages

Reply via email to