On Nov 5, 2007, at 3:24 , K J wrote:
The problem is, this membership list will also be displayed when
users request it. So would it be best to cache the entire list
somehow?
Yes, but I wouldn't try to think of that as something related to
your membership check.
On Nov 5, 2007, at 3:34 , K J wrote:
I had thought of doing a bloom filter on it as well. The problem
here is, the membership list might change sometimes, and reading
info on bloom filters, it's not really well-suited for dynamically
changing lists?
Well, bloom filters are fine as long as you're only adding to them.
I wasn't suggesting actually using a bloom filter, though. Just to
think of it more like a bloom filter in that you can only get an
answer that is either possibly present or definitely not present
(it's a set representation, not a map representation).
It was late, though, so I guess I was more typing whatever I was
thinking than describing what was a best fit for your question.
memcached is a map, so you can actually store a yes or no, but the
absence of either is a ``possibly.''
Suppose I cache 10,000 recently-logged in members. Also, suppose
50% of traffic actually come from these users. Then, this cache
would have a high hit ratio when testing for membership.
The 10,000 is just a preload. You can certainly cache the rest of
them as they actually come in.
However, what about the non-members? For instance let's say 40% of
the traffic come from non-members. This would mean there'd need to
be a full listing of members to check?
You have that in your primary data store. It just means you'd have
to go ask it and cache it.
Hmmm an interesting thought did just come across my mind. Let me
hear your thoughts:
Cache 10,000 most recently logged-in members
Bloom filter on the entire list
This way, you can test for negatives (bloom filter), and if there's
a positive, check the 10,000 most recently logged-in users. If
that still yields nothing, then do a database query. In effect,
only a small minority of checks would require a trip to the DB.
The bloom filter will not help you unless you have too many active
users to fit into a memcached cluster (and you don't).
Quick review of what went through my head:
1) You want set operations (particularly ∈)
2) Thinking of memcached as a normal set falls apart because of
evictions, so how about ∉
3) There *is* a value that will be stored at roughly no cost, so we
can do both.
#2 is the bloom filter. After realizing you'd have to go really far
out of your way to get any savings out of a 0 byte value vs. a 1 byte
value, I figured you might as well store a boolean. Unlike a bloom
filter, this means:
1) You can check for presence in a single operation.
2) You can further confirm whether the value indicates presence or
absence.
3) You can both add items *and* remove items after going to the
source.
So, I apologize for the confusion, but I'd recommend something like
this:
def is_special_user(uid):
val=memcache.get(uid)
if val is not None:
val=query_to_get_special_flag_for(uid)
memcache.set(uid, val)
return val
You could preload some if you felt you needed to do so. Adding or
removing membership can specifically update the cache.
As I mentioned above, your listing problem should be considered a
different one. I'd suspect it could be done more lazily than the
membership check, but if not, you could always invalidate the list
when adding or removing membership.
On Nov 4, 2007, at 23:32, J A wrote:
> I have a fairly large members list that I want to keep in memcache.
> What I do with this list is query it against particular user IDs to
> see if they are a member of that list or not. If they are they get
> certain priviledges.
>
> The problem is, this list has gotten to the point of saturating the
> PHP's memory when fetching the MySQL query the first time.
>
> Is there a way to do this more effectively, for instance,
> partitioning the list into separate smaller lists, grouped by time
> of login? I'm thinking of this, as users who have logged in in the
> past 3 months are more likely to be in the list anyway.
It'd be easier to not think of it as a list if you're just
testing
for membership. All you want to know is if a particular object is an
element of a particular set. You could do this by key convention if
you batch populate the records.
However, memcached semantics don't quite give you what you
want.
Depending on whether you can reasonably get a configuration to do what
you want, it might be easier to think of memcached as a bloom filter
than as a set in this case. That is, if you negatively cache things
that *aren't* part of your list, then the presence of a key will tell
you for certain that a particular key is not a member, but the absence
of a key would mean that you don't know (or perhaps memcached *did*
know, but had since forgotten).
You could, of course, record the status either way so as to
tell the
difference between not knowing and knowing whether it's a member or
not. This is probably best suited to your needs.
You could optionally preload objects that are likely to be
used if
you think the natural population wouldn't do it effectively (you can
measure this with stats).
--
Dustin Sallings
--
Dustin Sallings