I think this is a bit off topic now... There're two separate threads going on. The first being people giving many examples on how using memcached is actually pretty easy, and another about an in memory database with loose specs... I'd love to have brainstorming sessions on memcached design patterns, but I'm really not seeing how this database could be performant at all.
> You are absolutely correct... But wouldn't it be worth making a > memcached/database combo that's overcomplicated if we could get a > performance boost from every app that uses just like they use a > database? If that's easy, go ahead and patch mysql to do it? I think we can go back and forth on poo-poo'ing ideas, but if you can prove it wrong then that's fine. That needs specs, proof of concepts, algorithmic tests, etc. > First, In memory databases are, or at least can be far faster than > databases requiring a spinning disk access. Flash disk accesses and > remote in-memory accesses are somewhere in between. Eh, I'll expand a little: Take a machine with 32G of ram, and your dataset is 20G (data + indexes + overhead in innodb). Your innodb logfiles are 256 megs a piece, and you have 512M of battery backed write cache over an 8 disk RAID10 or whatever. Fire up your database, select * from all tables with an impossible constraint on each index (ghetto trick for preloading all indexes + data in innodb; see mysqlperformanceblog). As you do further reads innodb converts what it can into a hash table, and writes you do commit into the writeback memory. So you're not blocked on disk unless you start writing very heavily, or your dataset is bigger than memory. Most of us use these BBU's now... But it's irrelevant. We statically cache the results of the database queries to avoid the overhead of the database parser/optimizer/query execution and storage engine data fetch/conversion. We do it because memcached itself is distributed, and adding more helps. > Even with a full fledged database cluster doing the work, the cluster > ends up running more slowly because it has to handle all of the extra > database needs, replication, journaling, failover, version control, > blocking, etc etc. Sure? I guess? Replication's not much overhead... > In this system, I'm describing a system where the app, and the CDD are > on the same system to avoid extra remote accesses, since the CDD is > running on the app machine. More important, however... That doesn't make a huge difference... Everyone thought that it would make a huge difference so no one had really popularized memcached until brad wrote it and started proving that it wasn't such a huge deal. Even then, communicating with a process on the same machine isn't instantly fast, as you have scheduler overhead and system CPU blown on the pipe. Sure, network roundtrips suck. Yawn. Batch what you can into the same network roundtrip (multiget), and for the rest of the time your apps are idle on network instead of CPU is the amount of extra parallel processes you can fire up on your app servers. If my individual processes are spending 50% of their time in CPU land, and 50% of their time in wait, I can run 2x per core on the box. The only real limiting factor it gives you is rendering time, and if you're not querying a system that's as *dead simple* as memcached you're going to lose that overhead regardless. (I'm skipping the oracle bit; too much bias) > I just thought of another way of describing it. The CDD's are like > Reader Databases, while the core database handles all writing. I don't really understand why this is better... you're caching more redundant data everywhere. Memcached gives you higher cache efficiency by allowing you to use every last drop of storage you throw at it. For any amount of redundancy you lose cache size, and cache misses are almost guaranteed to suck more than the network roundtrip to memcached. > Yes, that is a better memcached specific application design. What I'm > suggesting will be significantly slower than an integrated designed-for > memcached app. That's way I believe it seems worthwhile to include the > current memcached functionality in the CDD. That's fine. I'd rather my whole app be fast and my developers understand what's going on... Simple abstractions and design patterns from the application can still give you most of that benefit, as Dustin continues to show many examples of. What's better is that it'll give you *more* than just that "my app thinks it's a database but it's a *little* faster!" benefit since you would have already built memcached into your app, so it'll be simpler to plug it in anywhere else. It reads like you want to try really hard to get a 50% speedup, instead of try a lot less to get a 90% speedup everywhere. > I don't get it... Here in "memcached land" we're dealing with > situations where if we DON'T warm up the cache before going live can > make sites blow up, meanwhile people are saying/thinking that a generic > memcached/database combination isn't worth the trouble. Dustin touched on this too; If it hurts when you do that, then don't do it. Regardless of how your system is designed, if you turn it all on at the same time it won't work immediately. You have to preload data somewhere, or do the simple thing and let traffic in slowly. I yakked about this same problem at SXSW by example of gaiaonline's many troubled feature launches. The solution was trivial, it was a marketing problem. > As another way of thinking about it, first implement the things that > memcached can do easily, and let the more complex tasks fall through to > the database. Surely that can be done without slowing the database down. If it's easy to add those features, add it to memcached's base and submit patches, please! :) The server as-is already has that functionality, so surely you can build on that :) > For the fun of it, let me revisit one of the more complex tasks, and see > if you can't see how this could result in an incredible performance > boost. Map-reduce with cache? Something that has to touch every part of a system isn't fundamentally scalable, that's usually a design tradeoff to make that query fast. -Dormando