Awesome ! Thanks a million , will read the paper right after my holiday.
Thanks a lot for sharing. Art On Feb 6, 2013 6:32 PM, "Jakub Łopuszański" <[email protected]> wrote: > As I really want this ideas to be well understood and wide spread I > prepared: > - a paper <http://vanisoft.pl/~lopuszanski/public/cache_invalidation.pdf>with > nice ilustrations and proofs of correctness, > - a > presentation<https://docs.google.com/presentation/d/15--B_I64Or54mXcakEpT8KkpAGfYl-kLh-Qvqt31GKI/edit?usp=sharing>with > more context and examples from everyday life of a programmer > - the source > code<https://github.com/qbolec/PHP-Framework/tree/master/php/framework/relations/managers>of > a working implementation used in "Znawcy > Futbolu" <http://nk.pl/aplikacje/znawcyfutbolu> and "Znasz > Mnie?"<http://nk.pl/aplikacje/znaszmnie>, the > applications by Vanisoft > > Hate it or ignore it, but at least do not say it is impractical, nor that > I did not try to explain it:) > > W dniu sobota, 28 kwietnia 2012 00:14:49 UTC+2 użytkownik Jakub > Łopuszański napisał: >> >> *There are only two hard things in Computer Science: cache invalidation >> and naming things. >> -- Phil Karlton >> >> Hi, I'd like to share with you an algorithm for cache invalidation that >> I've came up with, and successfully implemented in a real world >> application. I am a software architect at nk.pl, a Polish social network >> successfully competing with FB, and a founder and backend architect of >> Vanisoft, a startup in which I had more freedom to experiment with a >> different architecture. >> >> Perhaps my idea is not original, but I could not find a description of >> similar concept anywhere, so even if this is reinventing the wheel, maybe >> it will help propagate the knowledge and save somebody else's time. When I >> was designing the backend for Vanisoft's project ZnaszMnie (currently over >> 400k players) all I could find in the internet about ways to cache database >> queries results was either wrong (such as no cache invalidation at all), or >> ineffective (trying to keep of a list of all cached results and >> invalidating them one by one). >> >> I assume you have more than one front-end machines, some databases and >> cache which are reachable from frontends. Perhaps you have local cache on >> each front-end machine, which is really fast, but inaccessible from other >> machines (an thus subject to cache invalidation problems). If so, then my >> approach can correctly provide cached answer to a query by using one >> round-trip to global cache, and one to local (the cost of later is usually >> neglible), and never serve stale data (and the last part is the most >> important for me). >> >> I assume, that even if your database is not SQL, you could think about >> your queries as expressible in SQL, using SELECT, UPDATE, or DELETE. This >> is needed only to understand the presentation of the algorithm, which could >> be easily adapted to many other cases. >> >> To keep things simple let us consider a binary relation (a table with two >> columns, or a set of points in 2-dimensional space, if you prefer). The >> algorithm works for any number of dimensions, but it is easier to grasp for >> 2-D case. >> >> So let's think about about a table `songs` with columns "song_id" and >> "author_id", where obviously some songs may have several authors and >> vice-versa. Again, for simplicity assume, that these are integers, while >> the algorithm works for any (serializable) data type. >> >> For each query, I'd like to introduce a concept of "subspace" on which it >> operates. If you choose the interpretation of binary relation as a set of >> points in 2-dimensional space, then subspace, is what you get by fixing >> some of dimensions. >> Some examples -- a list of queries with their subspaces: >> INSERT INTO songs (song_id, author_id) VALUES (1,2) :: [song_id=1, >> author_id=2] >> SELECT COUNT(*) FROM songs WHERE author_id = 7 :: [author=7] >> DELETE FROM songs WHERE song_id = 13 :: [song_id=13] >> DELETE FROM songs :: [] >> >> Updates are a bit more complicated, and I'd prefer to think about them as >> a DELETE followed by INSERT, for example: >> UPDATE songs SET author_id=13 WHERE author_id=7 AND song_id=3 >> can be thought of as : >> DELETE FROM songs WHERE author_id=7 AND song_id=3 :: [song_id=3, >> author_id=7] >> INSERT INTO songs (song_id, author_id) VALUES (3,13) :: [song_id=3, >> author_id=13] >> I know this is not equivalent, and may cause some race conditions, but I >> believe it does not really matter in what follows. >> >> As you can see in our example, a subspace can be, 0, 1 or 2-dimensional, >> which depends on the scope on which the query operates. If we are unsure >> about the actual scope, we need to upperbound it for my algorithm to work. >> For example : >> SELECT COUNT(*) FROM songs WHERE song_id < 13 :: [] >> here we could not say much about rows which are important for this query, >> so we upperbound it with the whole space. Please note how subspace has >> nothing to do with the columns returned by select, but rather by the rows >> scanned during it. >> >> In general, the subspace is determined by all equality constraints found >> in the query. This is such a trivial correspondence, that you can (and >> should) let your backend library deduce it automatically from the query >> string. Or if you use some object oriented query builder, you should get it >> for free. >> >> For 2-dimensional space, there are 4 kinds of subspaces : >> - the whole space, given by no constraints at all : [] >> - a point, given by two constraints, for example [song_id=3, author_id=7] >> - a vertical, or horizontal line, given by one constraint, for example >> [song_id=3], or [author_id=7] >> >> If we have two queries A and B, such that A is a write (DELETE or >> INSERT), and B is a read (SELECT), then a cached result for B should be >> invalidated after query A, if subspace of A intersects subspace of B. >> (Actually this is a sufficient condition, not necessary). >> For example a query >> DELETE FROM songs WHERE author_id=7 AND song_id=3 :: [song_id=3, >> author_id=7] >> affects results of >> SELECT COUNT(*) FROM songs WHERE author_id = 7 :: [author=7] >> and you can tell it just by comparing [author=7] with [song_id=3, >> author_id=7], without looking at the queries. Of course this is some safe >> approximation : sometimes cache invalidation is not necessary, but >> performing it will ensure the correctness of algorithm. >> >> Now, notice, that there may be infinitely many queries, which may need to >> be invalidated, as there are many subspaces that intersect with a given >> one. For example a query with subspace [author=7] must invalidate all >> queries with subspace [song_id=1], as well as [song_id=2], etc... Moreover >> there are infinitely many possible queries with the same subspace. >> Frameworks that try to keep track of all intersecting queries are doomed to >> either store everything in a large, permanent storage (which kind of >> defeats the purpose of in-memory cache to me), or eventually forget about a >> query which needs to be invalidated (which is an error to me). >> >> The solution is much simpler though, and does not involve operating on >> infinitely many cache keys... only exponentially many. But, don't worry, it >> is exponential in the number of dimensions, not the size of dataset. For >> example for a binary relation we need to invalidate 2^2 = 4 keys. If you >> have a table with many columns, please don't be alarmed neither -- you just >> need to choose a few columns on which you'd like to focus, and crop the >> subspace to those few columns -- you will invalidate thinks more often than >> necessary, but will not get hit by exponential blow. I've never experienced >> a case with more than 3 relevant columns, and 2^3 = 8 is not a big number. >> >> We need some shorter notation for subspaces. Let us write them as >> 2-dimensional vectors (similar to rows in the table). Instead of >> [song_id=3,author_id=7], let us write (3,7), and for [song_id=3], we will >> use (3,*). Whole subspace is (*,*), and [author_id=4] is just (*,4). >> >> Now the tricky part. Let us introduce a new wildcard symbol "?", which >> will mean something along lines of "all subspaces which have a number >> here". For example (?,3) is a shorthand for the infinite set of subspaces >> {(0,3),(1,3),(2,3),...}. There is no particular query that has subspace >> (?,3). This artificial notation corresponds to all queries which have fixed >> second dimension. >> >> I assume that your global cache (such as memcache) supports atomic >> increment operation, which is very convenient for the task of maintaining a >> "revision number", which we will associate with each subspace (artificial >> or not). When I say "with each", I don't mean buying infinitely large >> computer -- we will just allocate memory for subspaces actually used in >> production, and those unused (or rarely used, if you have LRU garbage >> collector) will simply take no memory at all. >> A short note about evictions : if get or increment results in a miss, >> simply set the revision to current timestamp. >> >> I also assume that you know the idea of generational caching, that is, >> using a key name with appended revision to store cached data (such as >> "get_songs_count.123"). This way a single operation of incrementing a >> revision, can invalidate all data stored at key names which depended on it. >> This seems like a waste of memory, but LRU garbage collector can take care >> about leftovers. The nice thing about it, is that while a single integer >> (the revision number) must be stored in a global cache, the potentially >> large query result can be stored in local cache, from where it can be >> quickly retrieved without any bottlenecks of global cache. This is actually >> the only way (except for using silly small TTLs) for invalidating data >> stored in local cache that I know and use. >> >> Now the algorithm. Or actually two algorithms, one for read-only queries, >> and one for writes. >> >> When you perform a read query A, first determine its subspace. I assume >> you have d columns, so the subspace can be expressed as a d-arry vector of >> numbers and stars, as explained above. For example a query "SELECT song_id >> FROM songs WHERE author_id = 3" corresponds to a vector (*,3). >> Then compute at most all possible variations of this d-arry vector, by >> using a single substitution rule : you can put "?" in place of a number. >> For example, if your original vector was (*,3), then it results in two >> vectors: (*,?) and the original. If on the other hand, your query had >> subspace (7,3), then you would end up with four variations: >> (7,3),(?,3),(7,?),(?,?). >> Then, using a bulk (multi)get, fetch from the global cache revisions for >> those (at most 2^d) subspaces. >> Concatenate revisions into one big revision (for example 123.3.114.14). >> Then fetch from the local cache the cached result of the query, which >> should reside at the key which is a concatenation of the query string (or >> hash of it if you prefer) and the big concatenated revision. >> If there is a miss in the local cache, try in the global cache. >> If there is a miss in the global cache, try in the database. >> Then add missing keys to caches as needed. >> >> When you perform a write query B, determine its subspace. >> Then compute exactly 2^d possible variations of it, by using two >> substitution rules: >> a) you can put "?" in place of "*" >> b) you can put "*" in place of any number >> For example for a subspace (*,3) you'll end up with four variants: >> (*,3),(?,3),(*,*),(?,*). >> Then perform the query B. >> Finally increment revisions for all these variants. >> >> As you can see, updates are a bit slower than reads (which in my opinion >> is usual, and nothing to be worried about). Some caches implementations >> allow bulk increments in one round-trip, in which case this might actually >> be faster then read scenario, as you do not contact local cache at all. >> >> Now, some explanation, of why it works. >> Imagine a bipartite graph which on the left hand side has one vertex per >> each possible subspace of a write query, and on the right side has vertices >> corresponding to subspaces of read queries. Actually both sets are equal, >> but we will focus on edges. >> Edge goes from left to right, if a query on the left side affects results >> of a query on the right side. As said before, both sets are infinite, but >> that's not the problem. There are infinitely many edges, but it's also not >> bad. What's bad is that there are nodes on the left side with the infinite >> degree, which means, we need to invalidate infinitely many queries. What >> the above tricky algorithm does, is adding a third layer to the graph, in >> the middle between the two, such that the transitive closure of the >> resulting graph is still the same (in other words: you can still get by >> using two edges anywhere you could by one edge in the original graph), yet >> each node on the left, and each node on the right, have finite (actually >> constant) degree. This middle layer corresponds to the artificial subspaces >> with "?" marks, and serves as a connecting hub for all the mess. Now, when >> a query on the left executes, it needs to inform only its (small number of) >> neighbours about the change, moving the burden of reading this information >> to the right. That is, a query on the right side needs to check if there is >> a message in the "inbox" in the middle layer. So you can think about it as >> a cooperation where the left query makes one step forward, and the right >> query does a one step back, to meet at the central place, and pass the >> important information about the invalidation of cache. >> >> Another point of view is that two subspaces (v[1],...,v[d]) and >> (u[1],...,u[d]) intersect each other if and only if they agree on all >> positions without stars. Or in other words: there is no such index k, that >> v[k]<>u[k] and v[k]<>* and u[k]<>*. >> So, given a vector (v[1],...,v[d]), you can predict that any intersecting >> subspace will be described by a vector (u[1],...,u[d]) of some particular >> form. That is, it can be achieved from (v[1],...,v[d]) by using some simple >> substitution rules: >> a) you can change any number to a star >> b) you can change any star to any number >> c) you can not use both above rules for the same index k >> The problematic rule "b)" allows infinitely many possibilities. >> To overcome this we introduced the question mark symbol, which represents >> "any number" without stating any particular number. A path of operations >> leading from (v[1],..,v[d]) to (u[1],..,u[d]) was then explored from the >> other end, by trying to substitute numbers in (u[1],...,u[d]) by question >> marks. If we could reach the same term from both ends, then we know that >> there is a path from one end to the other, even though there is an infinite >> maze in the middle. >> >> As pointed before, you can use this technique for higher number of >> columns, or keep focus on just a few of them at the expense of premature >> invalidation. You can easily use this for a topology without local caches, >> by storing everything in the global cache. You can adapt the technique to >> columns storing strings, or nulls, or whatever. You can use it for noSQL >> databases if you need to. But this is not the end of possible improvements. >> >> If you know in advance what kind of write queries you will perform, that >> is -- if you know what subspaces will they have, you can "trim" the graph, >> to a smaller one, which will result in faster reads and writes, as you will >> not have to fetch that many (is 4 many ?) keys. >> >> For example if your modifications are just single-row operations (INSERT >> one row, DELETE one row), then your modifying queries subspaces never have >> stars in them. You can immediately see from the substitution rules, that >> you will never-ever increment revisions of artifcial subspaces (these with >> question marks), as "?" can be made only from "*". If so, then it is not >> necessary to fetch them, or even think about them. You can safely ignore >> them. >> Now, read queries only depend on revisions of vectors which differ from >> the original vector by replacing something with question marks. Since such >> revisions are not used in this setting, you only need to fetch revision of >> the original vector, which obviously had no question marks. This allows for >> quite fast and simple implementation, which fetches only single key from >> global cache. Actually my original framework forbid any other modifications >> than row-by-row just to get this extra-fast reads. I believe that in many >> real-world scenarios you never INSERT more than a single row at once. I've >> seen cases where you had an urge to DELETE many rows at once (for example >> when cleaning up dependent records), and may feel bad about the idea of >> doing it row-by-row. Think about amortized cost though: each of these rows >> you delete had had to be added at some point in the time, and if you >> already feel comfortable with the idea of adding rows one-by-one, then you >> already paid enough credit to not worry about the deletes. >> >> That's it folks. Now, which frameworks implement this idea, and in which >> paper I can read about it, and how many years did I overslept?* >> > -- > > --- > You received this message because you are subscribed to the Google Groups > "memcached" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
