my .02¢
• ldap would be silly unless you're clustering -- most
implementations use bdb as their backend
• bdb and cache::fastmmap would make more sense if you're on 1 machine
also
i think your hash system may be better off rethought....
you have:
$CACHE_1{id}='foo'
$CACHE_2{ida}{idb}='bar'
which limits you to thinking in terms of perl hash structures...
if you emulate that with flattended keys
cache_1_[\d+]
cache_2_[\w+]_[\w+]
then you have a lot more options. in a clustered system, you have
memcached or a dedicated mysql / whatever daemon
one of my projects , RoadSound, is a collaborative content management
system where each 'view' into a set of data is from the independent
perspective of both each relevant entities and content manager.
loosely translated -- to display the most basic details of a
concert, i need to do 4 15-20 table joins in postgres -- and I need
to do & store that seperately for each artist / venue / whatever
involved. in order to offload the db, i store everything in
memcached as I generate it with a key like: """show_%(id)s_%
(perspective_type_id)s_%(perspective_owner_id)""". it doesn't
perform nearly as fast using shared memory , but it offloads A TON of
work from my db and works across multiple machines.
the only issue to this approach would be clearing out the 'y' level
in this model: $CACHE_2{$y}{$z} . i don't know if that is a concern
for you or not, but that could create issues.
also- depending on your current performance, you might be able to
just use mysql as well. you could conceivably do something that
takes advantage of the speed of memory or myisam tables and select
query caching. while that wouldn't be as fast as using memory
alone , it clusters.
On May 19, 2007, at 6:13 PM, Will Fould wrote:
Thanks a lot Perrin -
I really like the current method (if it were to stay on 1 machine
and not grow). Caching per child has not really been a problem once
I got beyond the emotional hangup of what seemed to be duplicative,
waste of memory. I am totally amazed how fast and efficient using
modperl in this way has been. The hash building queries issued by
the children are very simple selects but the data provided by (and
cached within) them is used in many ways throughout the session
such that not having them would require extra joins in multiple
places and queries in other places that are currently not needed at
all. -- ( i.e. collaborative environment ACL's etc.). To be clear,
the hashes are not only for quick de-normalizing, but they serve a
vital caching function.
The problem is that I am now moving the database off localhost and
configuring a second web node now.
> what it is that you don't like about your current method.
I'm afraid that:
1. hashes get really big (greater than a few MB's each)
2. re-caching entire hash just b/c 1 key updated (waste).
3. latency for pulling cache data from remote DB.
4. doing this for all children.
For now, what seems like the 'holy-grail' (*) is to cache
last_modified for each type, (available to the cluster, say through
memcached), in a way that indicates only which parts of the cache
(which keys of each hash) the children need to update/delete such
that a child rarely, if ever, will only need to query for just
those keys and directly modify their own hashes accordingly to keep
current.
(*) I'm not too clear about this, but it seems like the real 'holy-
grail' would be to do this within apache in a scoreboard like way.
-w
On 5/19/07, Perrin Harkins <[EMAIL PROTECTED]> wrote: On 5/19/07,
Will Fould <[EMAIL PROTECTED]> wrote:
> Here's the situation: We have a fully normalized relational
database
> (mysql) now being accessed by a web application and to save a lot
of complex
> joins each time we grab rows from the database, I currently load
and cache a
> few simple hashes (1-10MB) in each apache processes with the
corresponding
> lookup data
Are you certain this is saving you all that much, compared to just
doing the joins? With proper indexes, joins are fast. It could be a
win to do them yourself, but it depends greatly on how much of the
data you end up displaying before the lookup tables change and have to
be re-fetched.
> Is anyone doing something similar? I'm wondering if implementing
a BerkleyDB
> or another slave store on each web node with a tied hash (or
something
> similar) is feasible and if not, what a better solution might be.
Well, first of all, I wouldn't feed a tied hash to my neighbor's dog.
It's slower than method calls, and more confusing.
There are lots of things you could do here, but it's not clear to me
what it is that you don't like about your current method. Is it that
when the database changes you have to do heavy queries from every
child process? That also kills any sharing of the data. Do you have
more than one server, or expect to soon?
- Perrin
// Jonathan Vanasco
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| SyndiClick.com
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| FindMeOn.com - The cure for Multiple Web Personality Disorder
| Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| RoadSound.com - Tools For Bands, Stuff For Fans
| Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -