The way the system works now, it is live. Every time a page is generated, it stores the most recent node ID along with the cached file. The next time the page is viewed, it checks to see what node is the most recent, and compares it against what was the newest when the file was cached. If they're the same, nothing has changed, and the cache file is served. If they're different, the system looks through the node additions that happened since the node was cached, and sees if the original node's text contains any of those node names. If it does, it regenerates, recaches and serves the page. Otherwise, it revalidates the cache file by storing the new most recent node ID with the old cache file, and serves it up.
This is not a bad approach, but there's room for refinement.
The problem with this is that 99% of the time, the document won't contain any of the new node names, so mod_perl is wasting most of its time serving up cached HTML.
It sounds like the problem is not so much that mod_perl is serving cached HTML, since that is easily improved with a reverse proxy server, but rather that your entire cache gets invalidated whenever anyone creates a new node, and mod_perl has to spend time regenerating pages that usually don't actually need to be regenerated.
I think you could improve this a great deal just by changing your cache invalidation system. When someone creates a new node, rather than assuming that anything which was cached before the most recent addition is now invalid, try to figure out which nodes are truly affected and just invalidate their caches. The way I would do this is by adding full-text search capabilities on your data, using something like MySQL's text search columns which allow you to index new documents on the fly rather than rebuilding the whole index. Then, when someone adds a new node called "Dinosaurs", you do a search for all nodes that contain the word "Dinosaurs" and invalidate their caches.
If you want to improve response time even more, you can have a cron job that periodically rebuilds anything without a cached version. Since you will be invalidating small sets of pages now when someone adds a new node rather than the entire site, this will only need to operate on a few pages each time it runs. Of course this is not practical if you often have people adding new nodes that need to be linked to from thousands of pages.
- Perrin