Here's the IRC chat from today: Most useful bits:
django-orm-cache.googlecode.com now exists. todo: [10:29pm] jdunck: Investigate whether ferringb (of curse) supplied signal performance patch [10:29pm] jdunck: Add multiget to all cache backends [10:29pm] jdunck: Add max_key_length, max_value_length to all cache backends [10:29pm] jdunck: add memcache's replace semantics for replace-only-if-exists semantics [10:29pm] jdunck: Support splitting qs values over max_value_length (in other words, do multiple sets and gets for a single list of objects if needed) [10:29pm] jdunck: bench sha vs. (python) hsieh and jenkins [10:29pm] jdunck: test w/o CachedModel __metaclass__ since that's a bit silly. [10:29pm] jdunck: invalidate whole list if any key in list is missing - ask dcramer [10:29pm] jdunck: All related field descriptors should check cache first [10:29pm] jdunck: Port to qs-refactor Full transcript: [7:27pm] jdunck: so, orm-based caching [7:27pm] jdunck: seems like row-level-caching goog soc never went anywhere? [7:27pm] jdunck: (you have time to talk now?) [7:33pm] zeeg: ya i do [7:33pm] zeeg: what you mean goog soc [7:33pm] jdunck: http://django-object-level-caching.googlecode.com/ [7:34pm] zeeg: oh [7:34pm] zeeg: i dont like that [7:34pm] zeeg: at all [7:34pm] zeeg: but ya [7:34pm] zeeg: you read over all my stuff? [7:34pm] jdunck: well, i saw it's monkey-patching the standard QS [7:34pm] jdunck: i did. [7:35pm] zeeg: ya really what I want at the core, is a magical CachedModel [7:35pm] zeeg: that can handle all invalidation that (for most uses) we need [7:35pm] zeeg: which is just delete invalidation [7:35pm] zeeg: then using signals for model level dependancies (via registration) [7:35pm] zeeg: and reverse key mappings (except a Model + pks mapping) for row-level [7:35pm] jdunck: yeah-- over the sprint, i talked to jacob about some new signal ideas-- and he said he doesn't want to add any signals without first improving signal performance. [7:36pm] jdunck: i *like* signals, so don't mind improving their performance [7:36pm] zeeg: ya i think trunk's signals still suck [7:36pm] zeeg: ferringb patched ours (he's one of the devs at Curse) [7:36pm] zeeg: im not sure if his patch made trunk tho [7:36pm] zeeg: but ya, signals for dependencies is the last of my concerns [7:36pm] zeeg: ill rely on expiration based caching mostly [7:36pm] zeeg: but being able to handle invalidation at the row level is.. beautiful [7:37pm] zeeg: its obviously a bit more of a performance hit handling caching like this, but my tests showed it wasn't big enough to matter [7:37pm] zeeg: i was even going to add in the pre-expiration routines [7:37pm] zeeg: (so if something expires in <predefined> minute, it gets automatically locked and recached by the first person to see it) [7:38pm] jdunck: not sure what you mean by "locked" [7:38pm] zeeg: basically, when you set a key, you set either another key, or a in that key you're setting you tokenize it [7:38pm] zeeg: and that other key, or the first token [7:39pm] zeeg: contains the expiration time, or expiration time - minutes [7:39pm] zeeg: and when you fetch that key [7:39pm] zeeg: if that expiration time has been reached (the pre-expiration), you set a lock value, which says, if anyone else is looking at this and checking, ignore it [7:39pm] zeeg: and then you recreate that cache [7:39pm] jdunck: ah. you assume no purging due to MRU memory limits? [7:39pm] zeeg: well ya, only so much you can plan for [7:39pm] zeeg: but w/ that, it potentially stops heavily accessed keys [7:40pm] zeeg: from being regenerating 100s of times [7:40pm] jdunck: fwiw, here's a wrapper i made to deal with the same problem: http://code.djangoproject.com/ticket/6199 [7:40pm] zeeg: if they take too long to generate [7:40pm] zeeg: ah ya [7:40pm] zeeg: thats your code? [7:40pm] jdunck: yeah [7:40pm] zeeg: I think I saw that linked on memcached [7:40pm] zeeg: they talked about the usage at CNET and I thought it'd be a great addition [7:40pm] jdunck: hmm. i posted on that list a while back, but it wasn't a ticket at the time. [7:41pm] zeeg: ya i just remember seeing the code [7:41pm] jdunck: well, anyway, do you not like that approach? just wrapping stampedes for the whole backend? [7:41pm] zeeg: and im like, cool, it must be useful if others are doing it [7:41pm] zeeg: well in the backend I think its the best approach actually [7:41pm] jdunck: i can see some ppl being annoyed that it has some book-keeping overhead and doesn't store exactly what you say to store. [7:41pm] zeeg: the way CNET did it, was they used 3 keys [7:42pm] zeeg: actual data, expiration key, and locking key [7:42pm] zeeg: which i can see benefits of doing it both in seperate keys, and in a combined key [7:43pm] jdunck: do you use gearman or some other background jobber? [7:43pm] zeeg: nope [7:43pm] zeeg: not familiar w/ them [7:44pm] jdunck: i mean, my understanding is that [EMAIL PROTECTED] went a totally different direction-- have a daemon that feeds in updated keys, so that web app never misses keys [7:44pm] jdunck: (obviously doesn't work for ad hoc stuff) [7:44pm] zeeg: ah ya we do that for a few things [7:44pm] zeeg: only things that are slow to cache tho [7:44pm] jdunck: do you have a > 1MB memcached compilation? [7:44pm] jdunck: i was surprised to find that hard limit. QS results can easily reach that. [7:45pm] zeeg: one sec brb [7:45pm] zeeg: like 1mb in a key? [7:45pm] jdunck: yeah [7:45pm] jdunck: crazy-talk, know. [7:45pm] jdunck: in your scheme, can you imagine a list of object keys getting to 1MB? [7:46pm] jdunck: 100 bytes per key, list of 10000 object keys would result ~1mb; missed key set in standard memcache [7:48pm] zeeg: hrm [7:48pm] zeeg: so you mean a cache that would store 10k objects in it? [7:50pm] jdunck: let me back up. a standard memcache will only store a key value of 1mb or less [7:50pm] jdunck: you can compile it to store more per key value [7:51pm] jdunck: we (pegnews.com) are currently through queryset results in cache [7:51pm] jdunck: sometimes that results in a miss because the qs is too big. [7:51pm] jdunck: we're silly for throwing in huge qs anyway, but quick-n-dirty mostly works [7:52pm] jdunck: anyway, if i understand correctly, your cacheqs would store hash(qs kwargs) as the key, and [ct_id:pk_val1, ct_id:pk_val2, ...] as the value [7:52pm] jdunck: each individual object has ct_id:pk_val:1 as the key, and the model instance as the value [7:52pm] jdunck: right? [7:53pm] jdunck: i was just pointing out that a result list long enough would still hit the 1mb limit, resulting in a miss on the qs key lookup. [7:55pm] zeeg: ya [7:55pm] zeeg: you'd still have the same limitation [7:55pm] zeeg: my plan was to store [7:55pm] zeeg: hrm [7:55pm] zeeg: what was my plan [7:55pm] jdunck: hah [7:55pm] zeeg: i think it was up in the air [7:55pm] zeeg: but it'd be like [7:55pm] zeeg: ModelClass,(pk, pk, pk, pk),(related, fields, to, select) [7:56pm] zeeg: feel free to poke holes [7:56pm] zeeg: the one issue i see [7:56pm] zeeg: im not sure how big ModelClass is [7:56pm] zeeg: when serialized [7:59pm] zeeg: but w/ this cool system [7:59pm] zeeg: if you *needed* too [7:59pm] zeeg: you could say "oh shit im trying to insert too much" [8:00pm] zeeg: and be like ModelClass, (pks*,), (fields*), number_of_keys [8:00pm] zeeg: and split it into multiple keys [8:00pm] zeeg: it would be nearly just as fast [8:00pm] zeeg: basing off of my multi-get bench results [8:00pm] zeeg: thats what i like about taking this approach [8:00pm] zeeg: is the developer doesnt have to worry about any of that [8:04pm] zeeg: im actually hoping to get a rough version of this done over the holidays while im on vaca [8:05pm] jdunck: the (related,fields,to,select) bit above is FK/M2M rels to follow? [8:05pm] zeeg: select_related more or less [8:05pm] zeeg: so it knows what to lookup in the batch keys when it grabs it [8:05pm] jdunck: yeah.. i wonder what select_related does for cycles... [8:05pm] zeeg: so it does select list -> select list of pks (batch) -> select huge batch of related_fields [8:05pm] jdunck: yeah, i follow [8:05pm] zeeg: although that potentially may have to be split up too [8:06pm] zeeg: is there a limit on how much data sends back and forth between memcached [8:06pm] jdunck: yeah, that's a simple abstraction, no biggie [8:06pm] zeeg: or is that the 1mb you were referring to (i was assuming storage) [8:06pm] jdunck: it's 1mb per key value by default in memcache. [8:06pm] zeeg: k [8:06pm] jdunck: other backends are different, i'm sure [8:06pm] zeeg: ya dont care about those tho [8:06pm] zeeg: if anyone uses anything else they're not looking for the kind of performance this is aimed at [8:07pm] zeeg: but in theory, it'd support them [8:07pm] zeeg: (i dont think they allow multi-gets tho, so it probably does them one at a time) [8:07pm] jdunck: i don't really care about them either, but if this is to go in core, we probly should make max_value_size and supports_multiget as vals on the cache backend [8:08pm] zeeg: doesnt cache backend all have multi get by default? [8:08pm] zeeg: i saw it in the memcached code so i assumed it was across the board [8:08pm] zeeg: (i want to personally add incr/decr into the cache backend) [8:08pm] zeeg: thats another thing id like to potentially support with this, is namespaces [8:08pm] zeeg: but thats another pretty big addition [8:08pm] zeeg: and can come later [8:08pm] jdunck: nope, not in file, for example [8:08pm] jdunck: easy to add, tho, that's a good point [8:09pm] zeeg: but being that cache keys are db_table:hash, should be fairly easy [8:09pm] jdunck: honestly, i don't get what incr/decr does. are you hand-rolling ref-counting on something? [8:09pm] jdunck: i mean, i understand what the primitive does, i'm just not smart enough to see the point [8:10pm] zeeg: ya if you used namespaces it could help [8:10pm] zeeg: iof you were threaded [8:10pm] zeeg: and you did cache.get then cache.set [8:10pm] zeeg: it could be invalid [8:10pm] zeeg: vs cache.incr [8:10pm] zeeg: or w/e [8:11pm] jdunck: are you making your code avl in hg somewhere? [8:12pm] jdunck: i mean, how do i contribute [8:13pm] zeeg: hrm i can see what it'd take to get a branch setup on djangoproj [8:13pm] zeeg: i dont think i can set it up on curse [8:13pm] zeeg: as i think ours requires auth [8:13pm] zeeg: (not too familiar w/ setup) [8:13pm] jdunck: yeah [8:14pm] zeeg: the code ive got so far is just a copy paste of modelbase/model editing the parts that needs changed, and a copy paste of our current "CacheManager" code which does no invalidation whatsoever [8:14pm] jdunck: i'm tracking 2 branches already. my head's gonna explode. i was just hoping to take changesets from you and quilt them or something. [8:14pm] zeeg: ah ya [8:14pm] zeeg: id rather it not even be in a branch honestly [8:14pm] zeeg: id rathe rjust external it in my cur stuff [8:14pm] zeeg: [8:14pm] zeeg: much easier [8:14pm] zeeg: actually i can set it up on google code i think [8:15pm] zeeg: i think the biggest factor of the current code, is the cachemanager -- need to guarantee no conflicts w/ the cache key [8:16pm] zeeg: need a name [8:16pm] jdunck: when you say cache manager, do you mean like a regular manager that returns a cacheqs from get_query_set, or is this something else? [8:16pm] zeeg: ya it caches all queries [8:16pm] zeeg: http://www.pastethat.com/?NYxG2 [8:16pm] zeeg: its probably not the best approach [8:16pm] zeeg: but its what i had working [8:17pm] zeeg: clean/reset are something im unsure of too [8:17pm] zeeg: clean can probably be removed [8:17pm] zeeg: (merged w/ reset more or less) [8:17pm] zeeg: clean was to delete it from a cache, which... well i think (now) its a waste to ever delete from the cache [8:17pm] zeeg: only to set over something, and let memcached delete [8:17pm] zeeg: but maybe for non-memcached users its useful [8:18pm] jdunck: django-orm-cache ? django-cachemanager ? [8:19pm] jdunck: as for ensuring no collision, simplest to just sort all args into consistent order, then sha it, no? [8:20pm] jdunck: or are you trying for readable keys? [8:20pm] zeeg: im using hash [8:20pm] zeeg: i didnt want to sha [8:20pm] zeeg: or md5 [8:20pm] zeeg: as i figured that'd be slow [8:20pm] jdunck: mebbe sha is too expensive, yeah [8:20pm] zeeg: but it does sort etc first [8:20pm] jdunck: there are lots of hash algs out there [8:20pm] zeeg: http://code.google.com/p/django-orm-cache/ [8:20pm] zeeg: gonna put the cache.py i have up there real quick [8:22pm] zeeg: k its committed there [8:22pm] jdunck: but collision is v. important to avoid. [8:22pm] zeeg: whats your google account name? [8:22pm] jdunck: [EMAIL PROTECTED] [8:22pm] zeeg: alright added you on the project [8:22pm] jdunck: python's sha is in c, i expect [8:22pm] jdunck: danke [8:22pm] zeeg: ya we can bench that [8:22pm] zeeg: i have to run though, ill be home in a few hours [8:23pm] jdunck: k [8:23pm] jdunck: interesting hashes [8:23pm] jdunck: http://www.burtleburtle.net/bob/hash/doobs.html [8:23pm] jdunck: http://www.azillionmonkeys.com/qed/hash.html [8:23pm] jdunck: you ok w/ me posting irc archives somewhere? marty wanted to be in on discussion [9:54pm] zeeg: back [10:05pm] jdunck: hello [10:06pm] jdunck: so, i was just going to post the irc log to the list thread, k? [10:06pm] zeeg: get a cance to look over stuff [10:06pm] zeeg: ya go ahead [10:09pm] jdunck: well, i'd seen cachedmodel before.. but its __new__ is the same as dj trunk, right? [10:10pm] jdunck: you're just using it to automagically supply objects=CacheManager ? [10:10pm] zeeg: right [10:10pm] zeeg: i think new is pretty much identical [10:10pm] jdunck: eh, i'd just tell people to override objects [10:10pm] jdunck: that's what we're doing in gis [10:11pm] zeeg: well then they have to override save/delete also [10:11pm] zeeg: or i have to use signals [yuck] [10:11pm] jdunck: hmm [10:11pm] zeeg: id rather just say "this model is cached" and it work [10:11pm] zeeg: I tried doing it as a mix-in but you cant override certain things [10:11pm] jdunck: i guess it didn't work to just make cachedmodel derive from models.Model, but not set __metaclass__ ? [10:12pm] zeeg: iirc correctly theres issues [10:12pm] zeeg: being thatyou cant properly subclass models [10:19pm] jdunck: looks like _get_sorted_clause_key is sorting the joined-up string where [10:19pm] jdunck: admittedly unlikely to collide, but probly not what was intended [10:20pm] jdunck: that is, the 2nd val in the return from queryset._get_sql_clause is the string form of of the join/where clause [10:24pm] jdunck: anyway, i've gotta run soon. [10:25pm] jdunck: what do you think of invalidating the whole list if a key [10:25pm] jdunck: if a key is missing from the multiget? [10:29pm] jdunck: here's the to-do list from chat here and code comments: [10:29pm] jdunck: Investigate whether ferringb (of curse) supplied signal performance patch [10:29pm] jdunck: Add multiget to all cache backends [10:29pm] jdunck: Add max_key_length, max_value_length to all cache backends [10:29pm] jdunck: add memcache's replace semantics for replace-only-if-exists semantics [10:29pm] jdunck: Support splitting qs values over max_value_length (in other words, do multiple sets and gets for a single list of objects if needed) [10:29pm] jdunck: bench sha vs. (python) hsieh and jenkins [10:29pm] jdunck: test w/o CachedModel __metaclass__ since that's a bit silly. [10:29pm] jdunck: invalidate whole list if any key in list is missing - ask dcramer [10:29pm] jdunck: All related field descriptors should check cache first [10:29pm] jdunck: Port to qs-refactor [10:29pm] zeeg: ya [10:29pm] zeeg: that was the plan [10:30pm] zeeg: if key is missing invalidate parent [10:30pm] zeeg: but ya that sounds good [10:31pm] jdunck: k [10:31pm] jdunck: offline-- will post this to list --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---
