Microsoft's answer to memcached: http://www.25hoursaday.com/weblog/2008/06/06/VelocityADistributedInMemoryCacheFromMicrosoft.aspx
Ryan On Fri, Jun 20, 2008 at 2:42 PM, Daniel <[EMAIL PROTECTED]> wrote: > Hi > > Thanks Dean and Josef for your responses... > > Josef, what's the name of the microsoft caching software, as I'm not > familiar with it. > > I know what you are saying is true to a degree, but I think it is worth > doing so every application that uses a database could gain the benefits > of memcached without requiring changes to the application code. > > Wouldn't it be nice to get the speed boost of caching in all parts of > your application without needing to complicate your code with memcached > requests AND database requests? > > I'm not aware of any open source database that is setup with memory > caching system that can be as large, fast, or as distributed as > memcached... It truly is a brilliant solution as is. > >> Integrating memcached into a database server API wouldn't be hard but >> I'm not sure it wouldn't cause a lot more problems than writing a >> database caching system from scratch. What you're talking about would >> require a great deal of work on the server's part to track which >> memcached server the data is stored on, to make sure that data is >> replaced when an update to the database is made, etc. >> > > Why would it take so much work on the server's part to track which > memcached server the data is stored on? Could not the core database > just use a hash? In fact, couldn't the hashing space could be > dynamically controlled by the core database to handle moving hashing > space from caching database daemon (CDD) to CDD. > > Of course, this solution should include all of the current memcached > api, to support the current users, and to allow fast caching/sharing of > application data that doesn't need to be backed by a database. > > >From what I understand of what you're asking, you basically want the >> database to be able to cache the result of a given query with a given >> set of parameters, so that if they query is made a second time with > the >> exact same parameters it can just "look it up" in it's cache and > return >> the results directly. > > No, that's the dumb way of caching. Surprisingly, even that way of > caching can provide incredible performance gains at times, but I'm going > to describe what I believe to be a much smarter way. > > In essence, every row from any table is cached as an element. The CDD > has enough database smarts to process the sql, joins, etc from tables > and indexes. It's just rather than having to go through the core > database for every read, far more data that is known to be good will be > available in the distributed cache. > > >> Then, the database would have some mechanism so that it would "know" >> when those cached queries are made invalid by *other* queries and >> automatically evict the invalidated results from the cache. > > There's the beauty and the challenge... By having the core database > communicate back to the cache when an update occurs, the cached data > stays current. > >> If it were really that simple , believe me, they'd all be doing it. >> That'd kill your TPC-C scores! >> > By kill, I assume you mean show a factor of 10 improvement or more??? > I don't know, and I'll detail my ideas on an implementation. > > For discussion, let's only talk about the database type of queries. For > the record, I believe the caching database daemon (CDD) should > essentially implement the current memcached api if for no other reason, > than that is how I see the different CDD's communicating amongst > themselves without bothering the core database. > > The amount of memcached support implemented in the CDD is completely > variable. Although I see it evolving to the point where the cached data > includes a timestamp in order to work smoothly with transactions, it > could provide a significant database performance boost by just > pre-processing the sql requests and caching non-transaction related > data. > > Let's look at how I see simple interactions working: > > On the database side, imagine a key/value pair is hashed and distributed > for every database row. > > The sql query is requesting the most recent data from a single row not > in a transaction. The CDD decodes the sql, determines the "hash" for > that row, and returns with data from a memcached get. If the get fails, > then the CDD controlling the hash will request the row from the database > core so there isn't any issues of duplicate requests or race conditions. > > In the case of a sql query involving multiple rows from one or more > tables, the CDD gets a list of all rows from the table, or an index if > appropriate, and processes each row as above. > > Simple row updates not involved in a transaction are again passed > through the appropriate CDD, to avoid race conditions. > > Now for some magic > > During a transaction things can get much more complicated, however we > can start by handling simple cases. In the simplest transaction, when it > is committed, the core database sends each CDD a list of its rows that > were modified on a priority channel. The priority channel is used so the > CDD will expire the affected, and optionally add the new row before the > CDD handles any more normal requests. > > Within a transaction, row data requests can include a timestamp or > revision index in an attempt to get data that was current at an earlier > point in time. I believe this will then allow the caching system to > duplicate the functionality of an mvcc database. > > Transactional updates will be passed to the core database. The core > database will be modified, so it too can take advantage of the cache. > Instead of going to the disk drive to request a row, it may request the > data from the appropriate CDD. > > And now, when things go wrong... > > In my understanding of this, things become the most complicated when > communications between the nodes fails. My current best idea involves a > heartbeat sharing of disabled nodes. The goal is when any node cannot > talk to another, it disables that node, and tells every other node about > the problem on a priority channel. The calling node then falls back on > the core database to handle all requests for that node. > > When the connection is restored, that node gets an updated or reset by > the core database before restarting. > > So, in conclusion, the end goal of this is to provide memcached type > caching to the database in such a way that the data it returns is always > accurate. I'm not saying this would be easy, but it does seem to be well > worth the effort. > > Thanks > > Daniel > > > >