On 5/19/07, Will Fould <[EMAIL PROTECTED]> wrote:
I'm afraid that: 1. hashes get really big (greater than a few MB's each) 2. re-caching entire hash just b/c 1 key updated (waste). 3. latency for pulling cache data from remote DB. 4. doing this for all children.
The most common way to improve speed is to cache things after you fetch them from the db, rather than pre-fetching as you are now. You give them a reasonable timeout value, and always check the cache for data first, falling back to the db if it's not there. For applications that can tolerate a little stale data and have a relatively small set of hot data, this works great. It also assumes that you can make your code fetch from the db (when the result is not cached yet) in a slow but reasonable amount of time. If you want to stick with pre-fetching, you have a few options. One is to use memcached. It will be much slower than your current method. However, you can update values whenever you like and they will be visible to all processes on all servers immediately. You can't count on data to be there though -- you have to structure your application so it can fetch from the db if memcached drops some data. It is not a database. Another is to build local shared caches with BerkeleyDB, MySQL on the local machine, or Cache::FastMmap. All of these will be faster than a remote memcached. You can update them with a cron job on each server and all children will see the results immediately. The same caveats about surviving missing data apply for Cache::FastMmap -- it's not a database either. In both cases, you are going to sacrifice performance. What you'll get for your trouble is memory -- no more duplicating MBs of data in every process.
For now, what seems like the 'holy-grail' (*) is to cache last_modified for each type, (available to the cluster, say through memcached), in a way that indicates only which parts of the cache (which keys of each hash) the children need to update/delete such that a child rarely, if ever, will only need to query for just those keys and directly modify their own hashes accordingly to keep current.
That actually sounds pretty easy -- put a timestamp on your rows and only fetch the data that changed since last time you asked. - Perrin