Re: Global question

Jonathan Vanasco Sat, 19 May 2007 15:53:37 -0700


my .02¢

• ldap would be silly unless you're clustering -- mostimplementations use bdb as their backend

        •  bdb and cache::fastmmap would make more sense if you're on 1 machine

also

i think your hash system may be better off rethought....

you have:
        $CACHE_1{id}='foo'
        $CACHE_2{ida}{idb}='bar'
        
which limits you to thinking in terms of perl hash structures...

if you emulate that with flattended keys
        cache_1_[\d+]
        cache_2_[\w+]_[\w+]

then you have a lot more options. in a clustered system, you havememcached or a dedicated mysql / whatever daemon

one of my projects , RoadSound, is a collaborative content managementsystem where each 'view' into a set of data is from the independentperspective of both each relevant entities and content manager.loosely translated -- to display the most basic details of aconcert, i need to do 4 15-20 table joins in postgres -- and I needto do & store that seperately for each artist / venue / whateverinvolved. in order to offload the db, i store everything inmemcached as I generate it with a key like: """show_%(id)s_%(perspective_type_id)s_%(perspective_owner_id)""". it doesn'tperform nearly as fast using shared memory , but it offloads A TON ofwork from my db and works across multiple machines.

the only issue to this approach would be clearing out the 'y' levelin this model: $CACHE_2{$y}{$z} . i don't know if that is a concernfor you or not, but that could create issues.

also- depending on your current performance, you might be able tojust use mysql as well. you could conceivably do something thattakes advantage of the speed of memory or myisam tables and selectquery caching. while that wouldn't be as fast as using memoryalone , it clusters.




On May 19, 2007, at 6:13 PM, Will Fould wrote:

Thanks a lot Perrin -
I really like the current method (if it were to stay on 1 machineand not grow). Caching per child has not really been a problem onceI got beyond the emotional hangup of what seemed to be duplicative,waste of memory. I am totally amazed how fast and efficient usingmodperl in this way has been. The hash building queries issued bythe children are very simple selects but the data provided by (andcached within) them is used in many ways throughout the sessionsuch that not having them would require extra joins in multipleplaces and queries in other places that are currently not needed atall. -- ( i.e. collaborative environment ACL's etc.). To be clear,the hashes are not only for quick de-normalizing, but they serve avital caching function.
The problem is that I am now moving the database off localhost andconfiguring a second web node now.
> what it is that you don't like about your current method.

I'm afraid that:
   1. hashes get really big (greater than a few MB's each)
   2. re-caching entire hash just b/c 1 key updated (waste).
   3. latency for pulling cache data from remote DB.
   4. doing this for all children.
For now, what seems like the 'holy-grail' (*) is to cachelast_modified for each type, (available to the cluster, say throughmemcached), in a way that indicates only which parts of the cache(which keys of each hash) the children need to update/delete suchthat a child rarely, if ever, will only need to query for justthose keys and directly modify their own hashes accordingly to keepcurrent.
(*) I'm not too clear about this, but it seems like the real 'holy-grail' would be to do this within apache in a scoreboard like way.
-w
On 5/19/07, Perrin Harkins <[EMAIL PROTECTED]> wrote: On 5/19/07,Will Fould <[EMAIL PROTECTED]> wrote:> Here's the situation: We have a fully normalized relationaldatabase> (mysql) now being accessed by a web application and to save a lotof complex> joins each time we grab rows from the database, I currently loadand cache a> few simple hashes (1-10MB) in each apache processes with thecorresponding
> lookup data

Are you certain this is saving you all that much, compared to just
doing the joins?  With proper indexes, joins are fast.  It could be a
win to do them yourself, but it depends greatly on how much of the
data you end up displaying before the lookup tables change and have to
be re-fetched.
> Is anyone doing something similar? I'm wondering if implementinga BerkleyDB> or another slave store on each web node with a tied hash (orsomething
> similar) is feasible and if not, what a better solution might be.

Well, first of all, I wouldn't feed a tied hash to my neighbor's dog.
It's slower than method calls, and more confusing.

There are lots of things you could do here, but it's not clear to me
what it is that you don't like about your current method.  Is it that
when the database changes you have to do heavy queries from every
child process?  That also kills any sharing of the data.  Do you have
more than one server, or expect to soon?

- Perrin


// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - -

| SyndiClick.com

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - -

|      FindMeOn.com - The cure for Multiple Web Personality Disorder
|      Web Identity Management and 3D Social Networking

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - -

|      RoadSound.com - Tools For Bands, Stuff For Fans
|      Collaborative Online Management And Syndication Tools

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - -

Re: Global question

Reply via email to