Re: Cache Invalidation Proposal -- CachedModel

David Cramer Wed, 19 Dec 2007 21:33:42 -0800

I'm going to be updating the project page with as much bad English as
I can possibly write :)


If anyone else is interested in contributing the project please let me
know.

On Dec 19, 8:33 pm, "Jeremy Dunck" <[EMAIL PROTECTED]> wrote:
> Here's the IRC chat from today:
>
> Most useful bits:
>
> django-orm-cache.googlecode.com now exists.
>
> todo:
> [10:29pm] jdunck: Investigate whether ferringb (of curse) supplied
> signal performance patch
> [10:29pm] jdunck: Add multiget to all cache backends
> [10:29pm] jdunck: Add max_key_length, max_value_length to all cache backends
> [10:29pm] jdunck: add memcache's replace semantics for
> replace-only-if-exists semantics
> [10:29pm] jdunck: Support splitting qs values over max_value_length
> (in other words, do multiple sets and gets for a single list of
> objects if needed)
> [10:29pm] jdunck: bench sha vs. (python) hsieh and jenkins
> [10:29pm] jdunck: test w/o CachedModel __metaclass__ since that's a bit silly.
> [10:29pm] jdunck: invalidate whole list if any key in list is missing
> - ask dcramer
> [10:29pm] jdunck: All related field descriptors should check cache first
> [10:29pm] jdunck: Port to qs-refactor
>
> Full transcript:
>
> [7:27pm] jdunck: so, orm-based caching
> [7:27pm] jdunck: seems like row-level-caching goog soc never went anywhere?
> [7:27pm] jdunck: (you have time to talk now?)
> [7:33pm] zeeg: ya i do
> [7:33pm] zeeg: what you mean goog soc
> [7:33pm] jdunck:http://django-object-level-caching.googlecode.com/
> [7:34pm] zeeg: oh
> [7:34pm] zeeg: i dont like that
> [7:34pm] zeeg: at all
> [7:34pm] zeeg: but ya
> [7:34pm] zeeg: you read over all my stuff?
> [7:34pm] jdunck: well, i saw it's monkey-patching the standard QS
> [7:34pm] jdunck: i did.
> [7:35pm] zeeg: ya really what I want at the core, is a magical CachedModel
> [7:35pm] zeeg: that can handle all invalidation that (for most uses) we need
> [7:35pm] zeeg: which is just delete invalidation
> [7:35pm] zeeg: then using signals for model level dependancies (via
> registration)
> [7:35pm] zeeg: and reverse key mappings (except a Model + pks mapping)
> for row-level
> [7:35pm] jdunck: yeah-- over the sprint, i talked to jacob about some
> new signal ideas-- and he said he doesn't want to add any signals
> without first improving signal performance.
> [7:36pm] jdunck: i *like* signals, so don't mind improving their performance
> [7:36pm] zeeg: ya i think trunk's signals still suck
> [7:36pm] zeeg: ferringb patched ours (he's one of the devs at Curse)
> [7:36pm] zeeg: im not sure if his patch made trunk tho
> [7:36pm] zeeg: but ya, signals for dependencies is the last of my concerns
> [7:36pm] zeeg: ill rely on expiration based caching mostly
> [7:36pm] zeeg: but being able to handle invalidation at the row level
> is.. beautiful
> [7:37pm] zeeg: its obviously a bit more of a performance hit handling
> caching like this, but my tests showed it wasn't big enough to matter
> [7:37pm] zeeg: i was even going to add in the pre-expiration routines
> [7:37pm] zeeg: (so if something expires in <predefined> minute, it
> gets automatically locked and recached by the first person to see it)
> [7:38pm] jdunck: not sure what you mean by "locked"
> [7:38pm] zeeg: basically, when you set a key, you set either another
> key, or a in that key you're setting you tokenize it
> [7:38pm] zeeg: and that other key, or the first token
> [7:39pm] zeeg: contains the expiration time, or expiration time - minutes
> [7:39pm] zeeg: and when you fetch that key
> [7:39pm] zeeg: if that expiration time has been reached (the
> pre-expiration), you set a lock value, which says, if anyone else is
> looking at this and checking, ignore it
> [7:39pm] zeeg: and then you recreate that cache
> [7:39pm] jdunck: ah.  you assume no purging due to MRU memory limits?
> [7:39pm] zeeg: well ya, only so much you can plan for
> [7:39pm] zeeg: but w/ that, it potentially stops heavily accessed keys
> [7:40pm] zeeg: from being regenerating 100s of times
> [7:40pm] jdunck: fwiw, here's a wrapper i made to deal with the same
> problem:http://code.djangoproject.com/ticket/6199
> [7:40pm] zeeg: if they take too long to generate
> [7:40pm] zeeg: ah ya
> [7:40pm] zeeg: thats your code?
> [7:40pm] jdunck: yeah
> [7:40pm] zeeg: I think I saw that linked on memcached
> [7:40pm] zeeg: they talked about the usage at CNET and I thought it'd
> be a great addition
> [7:40pm] jdunck: hmm.  i posted on that list a while back, but it
> wasn't a ticket at the time.
> [7:41pm] zeeg: ya i just remember seeing the code
> [7:41pm] jdunck: well, anyway, do you not like that approach?  just
> wrapping stampedes for the whole backend?
> [7:41pm] zeeg: and im like, cool, it must be useful if others are doing it
> [7:41pm] zeeg: well in the backend I think its the best approach actually
> [7:41pm] jdunck: i can see some ppl being annoyed that it has some
> book-keeping overhead and doesn't store exactly what you say to store.
> [7:41pm] zeeg: the way CNET did it, was they used 3 keys
> [7:42pm] zeeg: actual data, expiration key, and locking key
> [7:42pm] zeeg: which i can see benefits of doing it both in seperate
> keys, and in a combined key
> [7:43pm] jdunck: do you use gearman or some other background jobber?
> [7:43pm] zeeg: nope
> [7:43pm] zeeg: not familiar w/ them
> [7:44pm] jdunck: i mean, my understanding is that [EMAIL PROTECTED] went a
> totally different direction-- have a daemon that feeds in updated
> keys, so that web app never misses keys
> [7:44pm] jdunck: (obviously doesn't work for ad hoc stuff)
> [7:44pm] zeeg: ah ya we do that for a few things
> [7:44pm] zeeg: only things that are slow to cache tho
> [7:44pm] jdunck: do you have a > 1MB memcached compilation?
> [7:44pm] jdunck: i was surprised to find that hard limit.  QS results
> can easily reach that.
> [7:45pm] zeeg: one sec brb
> [7:45pm] zeeg: like 1mb in a key?
> [7:45pm] jdunck: yeah
> [7:45pm] jdunck: crazy-talk, know.
> [7:45pm] jdunck: in your scheme, can you imagine a list of object keys
> getting to 1MB?
> [7:46pm] jdunck: 100 bytes per key, list of 10000 object keys would
> result ~1mb; missed key set in standard memcache
> [7:48pm] zeeg: hrm
> [7:48pm] zeeg: so you mean a cache that would store 10k objects in it?
> [7:50pm] jdunck: let me back up.  a standard memcache will only store
> a key value of 1mb or less
> [7:50pm] jdunck: you can compile it to store more per key value
> [7:51pm] jdunck: we (pegnews.com) are currently through queryset
> results in cache
> [7:51pm] jdunck: sometimes that results in a miss because the qs is too big.
> [7:51pm] jdunck: we're silly for throwing in huge qs anyway, but
> quick-n-dirty mostly works
> [7:52pm] jdunck: anyway, if i understand correctly, your cacheqs would
> store hash(qs kwargs) as the key, and [ct_id:pk_val1, ct_id:pk_val2,
> ...] as the value
> [7:52pm] jdunck: each individual object has ct_id:pk_val:1 as the key,
> and the model instance as the value
> [7:52pm] jdunck: right?
> [7:53pm] jdunck: i was just pointing out that a result list long
> enough would still hit the 1mb limit, resulting in a miss on the qs
> key lookup.
> [7:55pm] zeeg: ya
> [7:55pm] zeeg: you'd still have the same limitation
> [7:55pm] zeeg: my plan was to store
> [7:55pm] zeeg: hrm
> [7:55pm] zeeg: what was my plan
> [7:55pm] jdunck: hah
> [7:55pm] zeeg: i think it was up in the air
> [7:55pm] zeeg: but it'd be like
> [7:55pm] zeeg: ModelClass,(pk, pk, pk, pk),(related, fields, to, select)
> [7:56pm] zeeg: feel free to poke holes
> [7:56pm] zeeg: the one issue i see
> [7:56pm] zeeg: im not sure how big ModelClass is
> [7:56pm] zeeg: when serialized
> [7:59pm] zeeg: but w/ this cool system
> [7:59pm] zeeg: if you *needed* too
> [7:59pm] zeeg: you could say "oh shit im trying to insert too much"
> [8:00pm] zeeg: and be like ModelClass, (pks*,), (fields*), number_of_keys
> [8:00pm] zeeg: and split it into multiple keys
> [8:00pm] zeeg: it would be nearly just as fast
> [8:00pm] zeeg: basing off of my multi-get bench results
> [8:00pm] zeeg: thats what i like about taking this approach
> [8:00pm] zeeg: is the developer doesnt have to worry about any of that
> [8:04pm] zeeg: im actually hoping to get a rough version of this done
> over the holidays while im on vaca
> [8:05pm] jdunck: the (related,fields,to,select) bit above is FK/M2M
> rels to follow?
> [8:05pm] zeeg: select_related more or less
> [8:05pm] zeeg: so it knows what to lookup in the batch keys when it grabs it
> [8:05pm] jdunck: yeah.. i wonder what select_related does for cycles...
> [8:05pm] zeeg: so it does select list -> select list of pks (batch) ->
> select huge batch of related_fields
> [8:05pm] jdunck: yeah, i follow
> [8:05pm] zeeg: although that potentially may have to be split up too
> [8:06pm] zeeg: is there a limit on how much data sends back and forth
> between memcached
> [8:06pm] jdunck: yeah, that's a simple abstraction, no biggie
> [8:06pm] zeeg: or is that the 1mb you were referring to (i was assuming 
> storage)
> [8:06pm] jdunck: it's 1mb per key value by default in memcache.
> [8:06pm] zeeg: k
> [8:06pm] jdunck: other backends are different, i'm sure
> [8:06pm] zeeg: ya dont care about those tho
> [8:06pm] zeeg: if anyone uses anything else they're not looking for
> the kind of performance this is aimed at
> [8:07pm] zeeg: but in theory, it'd support them
> [8:07pm] zeeg: (i dont think they allow multi-gets tho, so it probably
> does them one at a time)
> [8:07pm] jdunck: i don't really care about them either, but if this is
> to go in core, we probly should make max_value_size and
> supports_multiget as vals on the cache backend
> [8:08pm] zeeg: doesnt cache backend all have multi get by default?
> [8:08pm] zeeg: i saw it in the memcached code so i assumed it was
> across the board
> [8:08pm] zeeg: (i want to personally add incr/decr into the cache backend)
> [8:08pm] zeeg: thats another thing id like to potentially support with
> this, is namespaces
> [8:08pm] zeeg: but thats another pretty big addition
> [8:08pm] zeeg: and can come later
> [8:08pm] jdunck: nope, not in file, for example
> [8:08pm] jdunck: easy to add, tho, that's a good point
> [8:09pm] zeeg: but being that cache keys are db_table:hash, should be
> fairly easy
> [8:09pm] jdunck: honestly, i don't get what incr/decr does.  are you
> hand-rolling ref-counting on something?
> [8:09pm] jdunck: i mean, i understand what the primitive does, i'm
> just not smart enough to see the point
> [8:10pm] zeeg: ya if you used namespaces it could help
> [8:10pm] zeeg: iof you were threaded
> [8:10pm] zeeg: and you did cache.get then cache.set
> [8:10pm] zeeg: it ...
>
> read more >>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Cache Invalidation Proposal -- CachedModel

Reply via email to