Thanks for taking the time to reply Ray- I like using up simple ideas first too. I'm just worried that this project is gonna get big quick and I need to have the next several levels of "studliness" rolling around in the back of my head now I think ;)
As you rightly guessed- after many trials using a number of techniques, static pages were the one that moved all of the processing overhead "someplace else" and leaving pure html "chunks" in its place. I'm not quite to the point though of being able to serve it all up statically. I have some other processing that needs to be done but Apache::ASP handles that admirably. (BTW- just in case anybody here wasn't already sure... Apache::ASP _ROCKS_ !! :) What are your experiences with squid? Have you used it before? What sort of improvement have you typically seen by using it? As far as cache timeouts- I have a "clever" way of handling that. Each category's listings are stored in its own subdir. I simply regen the listings in a "new" dir and when they are done, slide the current dir to an "old" dir and the "new" dir as the "cur" dir. That way any file handles that are open will stay open and will certainly be concluded by the next round of processing. The only load is inodes and I think I have enough of those. My real concern with this method is the constant pounding the drives take in the process. Its all asynch system calls (read: can block) and I have really wanted to know, all things being equal- which is faster, assembling data from a file on disk or retrieving data via a networked database (assuming a fast enough backbone). I've never seen a well-crafted study of this scenario and no opinions I trust on the subject. One of these days I reckon I'll have to model it and find out- but it would be nice to find someone who already has an opionion. So the next question all this leads to is what is the best "search" scenario for a similar setup? Currently I'm just searching the database raw. I already know this will grind to a halt quickly. My first plan is to implement a quick and dirty inverted index. That should take me a ways down the road- how far I'm not sure since this will be my first foray into such things. I'm wondering how/where to improve efficiencies here and I find myself scratching my head quite a bit. Obviously there has to be some limit on returned tuples. However, all the tuples are selected even if they are culled out by a 'limit' later. So this part is hard to avoid. Building caches on the fly (say using the user's session id as a cache identifier) is my next thought but this can get out of hand quickly too- but overall seems to be the best way to go as it only has to be done once and can be reaped later by the OnSessionEnd() event. Any ideas or suggestions on searching? Is this a variation of the same problem or worse because its asynchronous? John Ray Cote wrote: > >At the moment, rebuilding the static caches is not a big issue- it > >doesn't take long and its possible to get it all rebuilt in less than > >half an hour. I'm concerned though what the next step will be when the > >number of listings grows to the point that half and hour (or an hour) > > Well, nothing is going to beat static caches -- particularly if you > put a caching server (such as Squid) in front of it. > > One thing you may want to consider is setting your cache timeouts > fairly long (longer than it takes to generate a single category). > Then, update a category and clear the cache. While you're generating, > you will not be serving any of the new pages since the cache won't be > looking yet. > > Also, simply moving the cache to a separate box may be sufficient to > get next bump of speed. Again, the cache is what is actually > 'serving' and not your web server with the static pages. Would be > interested to see how much of the processor you get to use during > page generation vs what your Web server is taking. > > Along similar lines, you should be able to free up significant disk > time (and head seeking) by offloading the cache. > > Just some initial thoughts on your request. Obviously, there's some > more complicated approaches you could take, but I like to run out of > simple ideas first. :} > > Ray > -- > ----------------------------------------------------------------- > Raymond Cote, President Appropriate Solutions, Inc. > www.AppropriateSolutions.com [EMAIL PROTECTED] > 603.924.6079(v) POB 458, Peterborough, NH 03458 603.924.8668(f) --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]