I wonder whether anybody has tried to build an in-memory bloom filter in 
front of an index to reduce datastore read operations?

In my application, I have an exact-match query on a single field, and it 
commonly matches no results. However, I still have to pay for datastore 
read operations in this case.

My idea was to build a bloom filter on every value of the field in my 
datastore. Given a query input, if the bloom filter says the value is a 
member of the set, I will query the datastore for it, which may or may not 
match results (i.e., a false positive).

The bloom filter would be wrapped in an app engine model and stored in the 
datastore and memcached. The write rate to the datastore for this index is 
rather low, so I plan to update the bloom filter transactionally and cache 
it on every write. The updates could also be done offline in a task queue.

The goal is to reduce the cost of searches, especially in the "no matches" 
case. I believe this change would reduce costs on datastore read 
operations, but increase CPU time because each request would have to read 
and deserialize a potentially large bloom filter from memcached. Clearly, 
this tradeoff could be tuned to the needs of the app, as a larger bloom 
filter would produce fewer false positives and wasted datastore reads.

Thoughts?

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/ViVc7VJ8iOAJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to