On Nov 5, 2007, at 3:24 , K J wrote:
The problem is, this membership list will also be displayed when users request it. So would it be best to cache the entire list somehow?

Yes, but I wouldn't try to think of that as something related to your membership check.


On Nov 5, 2007, at 3:34 , K J wrote:

I had thought of doing a bloom filter on it as well. The problem here is, the membership list might change sometimes, and reading info on bloom filters, it's not really well-suited for dynamically changing lists?

Well, bloom filters are fine as long as you're only adding to them. I wasn't suggesting actually using a bloom filter, though. Just to think of it more like a bloom filter in that you can only get an answer that is either possibly present or definitely not present (it's a set representation, not a map representation).

It was late, though, so I guess I was more typing whatever I was thinking than describing what was a best fit for your question.

memcached is a map, so you can actually store a yes or no, but the absence of either is a ``possibly.''

Suppose I cache 10,000 recently-logged in members. Also, suppose 50% of traffic actually come from these users. Then, this cache would have a high hit ratio when testing for membership.

The 10,000 is just a preload. You can certainly cache the rest of them as they actually come in.

However, what about the non-members? For instance let's say 40% of the traffic come from non-members. This would mean there'd need to be a full listing of members to check?

You have that in your primary data store. It just means you'd have to go ask it and cache it.

Hmmm an interesting thought did just come across my mind. Let me hear your thoughts:
Cache 10,000 most recently logged-in members
Bloom filter on the entire list
This way, you can test for negatives (bloom filter), and if there's a positive, check the 10,000 most recently logged-in users. If that still yields nothing, then do a database query. In effect, only a small minority of checks would require a trip to the DB.

The bloom filter will not help you unless you have too many active users to fit into a memcached cluster (and you don't).

        Quick review of what went through my head:

        1)  You want set operations (particularly ∈)
2) Thinking of memcached as a normal set falls apart because of evictions, so how about ∉ 3) There *is* a value that will be stored at roughly no cost, so we can do both.

#2 is the bloom filter. After realizing you'd have to go really far out of your way to get any savings out of a 0 byte value vs. a 1 byte value, I figured you might as well store a boolean. Unlike a bloom filter, this means:

        1)  You can check for presence in a single operation.
2) You can further confirm whether the value indicates presence or absence. 3) You can both add items *and* remove items after going to the source.

So, I apologize for the confusion, but I'd recommend something like this:


def is_special_user(uid):
        val=memcache.get(uid)
        if val is not None:
                val=query_to_get_special_flag_for(uid)
                memcache.set(uid, val)
        return val

You could preload some if you felt you needed to do so. Adding or removing membership can specifically update the cache.

As I mentioned above, your listing problem should be considered a different one. I'd suspect it could be done more lazily than the membership check, but if not, you could always invalidate the list when adding or removing membership.


On Nov 4, 2007, at 23:32, J A wrote:

> I have a fairly large members list that I want to keep in memcache.
> What I do with this list is query it against particular user IDs to
> see if they are a member of that list or not.  If they are they get
> certain priviledges.
>
> The problem is, this list has gotten to the point of saturating the
> PHP's memory when fetching the MySQL query the first time.
>
> Is there a way to do this more effectively, for instance,
> partitioning the list into separate smaller lists, grouped by time
> of login?  I'm thinking of this, as users who have logged in in the
> past 3 months are more likely to be in the list anyway.


It'd be easier to not think of it as a list if you're just testing
for membership.  All you want to know is if a particular object is an
element of a particular set.  You could do this by key convention if
you batch populate the records.

However, memcached semantics don't quite give you what you want.
Depending on whether you can reasonably get a configuration to do what
you want, it might be easier to think of memcached as a bloom filter
than as a set in this case.  That is, if you negatively cache things
that *aren't* part of your list, then the presence of a key will tell
you for certain that a particular key is not a member, but the absence
of a key would mean that you don't know (or perhaps memcached *did*
know, but had since forgotten).

You could, of course, record the status either way so as to tell the
difference between not knowing and knowing whether it's a member or
not.  This is probably best suited to your needs.

You could optionally preload objects that are likely to be used if
you think the natural population wouldn't do it effectively (you can
measure this with stats).

--
Dustin Sallings





--
Dustin Sallings


Reply via email to