Re: Algorithm for automatic cache invalidation

Artur Ejsmont Wed, 06 Feb 2013 04:56:08 -0800

Awesome !

Thanks a million , will read the paper right after my holiday.


Thanks a lot for sharing.

Art
On Feb 6, 2013 6:32 PM, "Jakub Łopuszański" <[email protected]> wrote:

> As I really want this ideas to be well understood and wide spread I
> prepared:
> - a paper <http://vanisoft.pl/~lopuszanski/public/cache_invalidation.pdf>with 
> nice ilustrations and proofs of correctness,
> - a 
> presentation<https://docs.google.com/presentation/d/15--B_I64Or54mXcakEpT8KkpAGfYl-kLh-Qvqt31GKI/edit?usp=sharing>with
>  more context and examples from everyday life of a programmer
> - the source 
> code<https://github.com/qbolec/PHP-Framework/tree/master/php/framework/relations/managers>of
>  a working implementation used in "Znawcy
> Futbolu" <http://nk.pl/aplikacje/znawcyfutbolu> and "Znasz 
> Mnie?"<http://nk.pl/aplikacje/znaszmnie>, the
> applications by Vanisoft
>
> Hate it or ignore it, but at least do not say it is impractical, nor that
> I did not try to explain it:)
>
> W dniu sobota, 28 kwietnia 2012 00:14:49 UTC+2 użytkownik Jakub
> Łopuszański napisał:
>>
>> *There are only two hard things in Computer Science: cache invalidation
>> and naming things.
>> -- Phil Karlton
>>
>> Hi, I'd like to share with you an algorithm for cache invalidation that
>> I've came up with, and successfully implemented in a real world
>> application. I am a software architect at nk.pl, a Polish social network
>> successfully competing with FB, and a founder and backend architect of
>> Vanisoft, a startup in which I had more freedom to experiment with a
>> different architecture.
>>
>> Perhaps my idea is not original, but I could not find a description of
>> similar concept anywhere, so even if this is reinventing the wheel, maybe
>> it will help propagate the knowledge and save somebody else's time. When I
>> was designing the backend for Vanisoft's project ZnaszMnie (currently over
>> 400k players) all I could find in the internet about ways to cache database
>> queries results was either wrong (such as no cache invalidation at all), or
>> ineffective (trying to keep of a list of all cached results and
>> invalidating them one by one).
>>
>> I assume you have more than one front-end machines, some databases and
>> cache which are reachable from frontends. Perhaps you have local cache on
>> each front-end machine, which is really fast, but inaccessible from other
>> machines (an thus subject to cache invalidation problems). If so, then my
>> approach can correctly provide cached answer to a query by using one
>> round-trip to global cache, and one to local (the cost of later is usually
>> neglible), and never serve stale data (and the last part is the most
>> important for me).
>>
>> I assume, that even if your database is not SQL, you could think about
>> your queries as expressible in SQL, using SELECT, UPDATE, or DELETE. This
>> is needed only to understand the presentation of the algorithm, which could
>> be easily adapted to many other cases.
>>
>> To keep things simple let us consider a binary relation (a table with two
>> columns, or a set of points in 2-dimensional space, if you prefer). The
>> algorithm works for any number of dimensions, but it is easier to grasp for
>> 2-D case.
>>
>> So let's think about about a table `songs` with columns "song_id" and
>> "author_id", where obviously some songs may have several authors and
>> vice-versa. Again, for simplicity assume, that these are integers, while
>> the algorithm works for any (serializable) data type.
>>
>> For each query, I'd like to introduce a concept of "subspace" on which it
>> operates. If you choose the interpretation of binary relation as a set of
>> points in 2-dimensional space, then subspace, is what you get by fixing
>> some of dimensions.
>> Some examples -- a list of queries with their subspaces:
>> INSERT INTO songs (song_id, author_id) VALUES (1,2)  ::  [song_id=1,
>> author_id=2]
>> SELECT COUNT(*) FROM songs WHERE author_id = 7 ::  [author=7]
>> DELETE FROM songs WHERE song_id = 13 :: [song_id=13]
>> DELETE FROM songs :: []
>>
>> Updates are a bit more complicated, and I'd prefer to think about them as
>> a DELETE followed by INSERT, for example:
>> UPDATE songs SET author_id=13 WHERE author_id=7 AND song_id=3
>> can be thought of as :
>> DELETE FROM songs WHERE author_id=7 AND song_id=3 :: [song_id=3,
>> author_id=7]
>> INSERT INTO songs (song_id, author_id) VALUES (3,13)   :: [song_id=3,
>> author_id=13]
>> I know this is not equivalent, and may cause some race conditions, but I
>> believe it does not really matter in what follows.
>>
>> As you can see in our example, a subspace can be, 0, 1 or 2-dimensional,
>> which depends on the scope on which the query operates. If we are unsure
>> about the actual scope, we need to upperbound it for my algorithm to work.
>> For example :
>> SELECT COUNT(*) FROM songs WHERE song_id < 13 :: []
>> here we could not say much about rows which are important for this query,
>> so we upperbound it with the whole space. Please note how subspace has
>> nothing to do with the columns returned by select, but rather by the rows
>> scanned during it.
>>
>> In general, the subspace is determined by all equality constraints found
>> in the query. This is such a trivial correspondence, that you can (and
>> should) let your backend library deduce it automatically from the query
>> string. Or if you use some object oriented query builder, you should get it
>> for free.
>>
>> For 2-dimensional space, there are 4 kinds of subspaces :
>> - the whole space, given by no constraints at all : []
>> - a point, given by two constraints, for example [song_id=3, author_id=7]
>> - a vertical, or horizontal line, given by one constraint, for example
>> [song_id=3], or [author_id=7]
>>
>> If we have two queries A and B, such that A is a write (DELETE or
>> INSERT), and B is a read (SELECT), then a cached result for B should be
>> invalidated after query A, if subspace of A intersects subspace of B.
>> (Actually this is a sufficient condition, not necessary).
>> For example a query
>> DELETE FROM songs WHERE author_id=7 AND song_id=3 :: [song_id=3,
>> author_id=7]
>> affects results of
>> SELECT COUNT(*) FROM songs WHERE author_id = 7 ::  [author=7]
>> and you can tell it just by comparing [author=7] with [song_id=3,
>> author_id=7], without looking at the queries. Of course this is some safe
>> approximation : sometimes cache invalidation is not necessary, but
>> performing it will ensure the correctness of algorithm.
>>
>> Now, notice, that there may be infinitely many queries, which may need to
>> be invalidated, as there are many subspaces that intersect with a given
>> one. For example a query with subspace [author=7] must invalidate all
>> queries with subspace [song_id=1], as well as [song_id=2], etc... Moreover
>> there are infinitely many possible queries with the same subspace.
>> Frameworks that try to keep track of all intersecting queries are doomed to
>> either store everything in a large, permanent storage (which kind of
>> defeats the purpose of in-memory cache to me), or eventually forget about a
>> query which needs to be invalidated (which is an error to me).
>>
>> The solution is much simpler though, and does not involve operating on
>> infinitely many cache keys... only exponentially many. But, don't worry, it
>> is exponential in the number of dimensions, not the size of dataset. For
>> example for a binary relation we need to invalidate 2^2 = 4 keys. If you
>> have a table with many columns, please don't be alarmed neither -- you just
>> need to choose a few columns on which you'd like to focus, and crop the
>> subspace to those few columns -- you will invalidate thinks more often than
>> necessary, but will not get hit by exponential blow. I've never experienced
>> a case with more than 3 relevant columns, and 2^3 = 8 is not a big number.
>>
>> We need some shorter notation for subspaces. Let us write them as
>> 2-dimensional vectors (similar to rows in the table). Instead of
>> [song_id=3,author_id=7], let us write (3,7), and for [song_id=3], we will
>> use (3,*). Whole subspace is (*,*), and [author_id=4] is just (*,4).
>>
>> Now the tricky part. Let us introduce a new wildcard symbol "?", which
>> will mean something along lines of "all subspaces which have a number
>> here". For example (?,3) is a shorthand for the infinite set of subspaces
>> {(0,3),(1,3),(2,3),...}. There is no particular query that has subspace
>> (?,3). This artificial notation corresponds to all queries which have fixed
>> second dimension.
>>
>> I assume that your global cache (such as memcache) supports atomic
>> increment operation, which is very convenient for the task of maintaining a
>> "revision number", which we will associate with each subspace (artificial
>> or not). When I say "with each", I don't mean buying infinitely large
>> computer -- we will just allocate memory for subspaces actually used in
>> production, and those unused (or rarely used, if you have LRU garbage
>> collector) will simply take no memory at all.
>> A short note about evictions : if get or increment results in a miss,
>> simply set the revision to current timestamp.
>>
>> I also assume that you know the idea of generational caching, that is,
>> using a key name with appended revision to store cached data (such as
>> "get_songs_count.123"). This way a single operation of incrementing a
>> revision, can invalidate all data stored at key names which depended on it.
>> This seems like a waste of memory, but LRU garbage collector can take care
>> about leftovers. The nice thing about it, is that while a single integer
>> (the revision number) must be stored in a global cache, the potentially
>> large query result can be stored in local cache, from where it can be
>> quickly retrieved without any bottlenecks of global cache. This is actually
>> the only way (except for using silly small TTLs) for invalidating data
>> stored in local cache that I know and use.
>>
>> Now the algorithm. Or actually two algorithms, one for read-only queries,
>> and one for writes.
>>
>> When you perform a read query A, first determine its subspace. I assume
>> you have d columns, so the subspace can be expressed as a d-arry vector of
>> numbers and stars, as explained above. For example a query "SELECT song_id
>> FROM songs WHERE author_id = 3" corresponds to a vector (*,3).
>> Then compute at most all possible variations of this d-arry vector, by
>> using a single substitution rule : you can put "?" in place of a number.
>> For example, if your original vector was (*,3), then it results in two
>> vectors: (*,?) and the original. If on the other hand, your query had
>> subspace (7,3), then you would end up with four variations:
>> (7,3),(?,3),(7,?),(?,?).
>> Then, using a bulk (multi)get, fetch from the global cache revisions for
>> those (at most 2^d) subspaces.
>> Concatenate revisions into one big revision (for example 123.3.114.14).
>> Then fetch from the local cache the cached result of the query, which
>> should reside at the key which is a concatenation of the query string (or
>> hash of it if you prefer) and the big concatenated revision.
>> If there is a miss in the local cache, try in the global cache.
>> If there is a miss in the global cache, try in the database.
>> Then add missing keys to caches as needed.
>>
>> When you perform a write query B, determine its subspace.
>> Then compute exactly 2^d possible variations of it, by using two
>> substitution rules:
>> a) you can put "?" in place of "*"
>> b) you can put "*" in place of any number
>> For example for a subspace (*,3) you'll end up with four variants:
>> (*,3),(?,3),(*,*),(?,*).
>> Then perform the query B.
>> Finally increment revisions for all these variants.
>>
>> As you can see, updates are a bit slower than reads (which in my opinion
>> is usual, and nothing to be worried about). Some caches implementations
>> allow bulk increments in one round-trip, in which case this might actually
>> be faster then read scenario, as you do not contact local cache at all.
>>
>> Now, some explanation, of why it works.
>> Imagine a bipartite graph which on the left hand side has one vertex per
>> each possible subspace of a write query, and on the right side has vertices
>> corresponding to subspaces of read queries. Actually both sets are equal,
>> but we will focus on edges.
>> Edge goes from left to right, if a query on the left side affects results
>> of a query on the right side. As said before, both sets are infinite, but
>> that's not the problem. There are infinitely many edges, but it's also not
>> bad. What's bad is that there are nodes on the left side with the infinite
>> degree, which means, we need to invalidate infinitely many queries. What
>> the above tricky algorithm does, is adding a third layer to the graph, in
>> the middle between the two, such that the transitive closure of the
>> resulting graph is still the same (in other words: you can still get by
>> using two edges anywhere you could by one edge in the original graph), yet
>> each node on the left, and each node on the right, have finite (actually
>> constant) degree. This middle layer corresponds to the artificial subspaces
>> with "?" marks, and serves as a connecting hub for all the mess. Now, when
>> a query on the left executes, it needs to inform only its (small number of)
>> neighbours about the change, moving the burden of reading this information
>> to the right. That is, a query on the right side needs to check if there is
>> a message in the "inbox" in the middle layer. So you can think about it as
>> a cooperation where the left query makes one step forward, and the right
>> query does a one step back, to meet at the central place, and pass the
>> important information about the invalidation of cache.
>>
>> Another point of view is that two subspaces (v[1],...,v[d]) and
>> (u[1],...,u[d]) intersect each other if and only if they agree on all
>> positions without stars. Or in other words: there is no such index k, that
>> v[k]<>u[k] and v[k]<>* and u[k]<>*.
>> So, given a vector (v[1],...,v[d]), you can predict that any intersecting
>> subspace will be described by a vector (u[1],...,u[d]) of some particular
>> form. That is, it can be achieved from (v[1],...,v[d]) by using some simple
>> substitution rules:
>> a) you can change any number to a star
>> b) you can change any star to any number
>> c) you can not use both above rules for the same index k
>> The problematic rule "b)" allows infinitely many possibilities.
>> To overcome this we introduced the question mark symbol, which represents
>> "any number" without stating any particular number. A path of operations
>> leading from (v[1],..,v[d]) to (u[1],..,u[d]) was then explored from the
>> other end, by trying to substitute numbers in (u[1],...,u[d]) by question
>> marks. If we could reach the same term from both ends, then we know that
>> there is a path from one end to the other, even though there is an infinite
>> maze in the middle.
>>
>> As pointed before, you can use this technique for higher number of
>> columns, or keep focus on just a few of them at the expense of premature
>> invalidation. You can easily use this for a topology without local caches,
>> by storing everything in the global cache. You can adapt the technique to
>> columns storing strings, or nulls, or whatever. You can use it for noSQL
>> databases if you need to. But this is not the end of possible improvements.
>>
>> If you know in advance what kind of write queries you will perform, that
>> is -- if you know what subspaces will they have, you can "trim" the graph,
>> to a smaller one, which will result in faster reads and writes, as you will
>> not have to fetch that many (is 4 many ?) keys.
>>
>> For example if your modifications are just single-row operations (INSERT
>> one row, DELETE one row), then your modifying queries subspaces never have
>> stars in them. You can immediately see from the substitution rules, that
>> you will never-ever increment revisions of artifcial subspaces (these with
>> question marks), as "?" can be made only from "*". If so, then it is not
>> necessary to fetch them, or even think about them. You can safely ignore
>> them.
>> Now, read queries only depend on revisions of vectors which differ from
>> the original vector by replacing something with question marks. Since such
>> revisions are not used in this setting, you only need to fetch revision of
>> the original vector, which obviously had no question marks. This allows for
>> quite fast and simple implementation, which fetches only single key from
>> global cache. Actually my original framework forbid any other modifications
>> than row-by-row just to get this extra-fast reads. I believe that in many
>> real-world scenarios you never INSERT more than a single row at once. I've
>> seen cases where you had an urge to DELETE many rows at once (for example
>> when cleaning up dependent records), and may feel bad about the idea of
>> doing it row-by-row. Think about amortized cost though: each of these rows
>> you delete had had to be added at some point in the time, and if you
>> already feel comfortable with the idea of adding rows one-by-one, then you
>> already paid enough credit to not worry about the deletes.
>>
>> That's it folks. Now, which frameworks implement this idea, and in which
>> paper I can read about it, and how many years did I overslept?*
>>
>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Algorithm for automatic cache invalidation

Reply via email to