On 9/22/10 10:23 AM, Les Mikesell wrote:
On 9/22/2010 11:59 AM, Matt Ingenthron wrote:
On 9/22/10 6:12 AM, ligerdave wrote:
MongoDB is actually "cached" db, meaning that, most of its records are
in memory.

I think there is also a memcached and DB hybrid which comes w/ a
persistent option. i think it's called memcachedDB, which runs a in-
memory db(like mongodb). this shares most of common api w/ memcached
so you dont have to change code very much

membase is compatible with memcached protocol, has a 20MByte default
object size limit, lets you define memory and disk usage across nodes in
different "buckets".

memcacheDB is challenging to deploy for a few reasons, one of which is
that the topology is fixed at deployment time.

Does anyone know how these would compare to 'riak', a distributed database that can do redundancy with some fault tolerance and knows how to rebalance the storage across nodes when they are added or removed? (Other than the different client interface...).

This is a very detailed question, but...

Without going too much into advocacy (I'd defer you to the membase list/site), membase does have redundancy, fault tolerance and can rebalance when nodes are added and removed. The interface to membase is memcached protocol. It does so by making sure there is an authoritative place for any given piece of data at any given point in time. That doesn't mean data's not replicated or persisted, just that there are rules about the state changes for a given piece of data based on vbucket hashing and a shared configuration.

This was actually inspired by similar concepts that in memcached's codebase up through the early 1.2.x, but not in use in anywhere that I'm familiar with.

riak is more designed around eventually consistent and lots of tuning W+R>N, meaning that it is designed more to always take writes and deal with consistency for reads by doing multiple reads. This is different than memcached in that memcached expects one and only one location for a given piece of data with a given topology. If the topology changes (node failures, additions), things like consistent hashing dictate a new place, but there aren't multiple places to write to.

Any time you accept concurrent writes in more than one place, you have to deal with conflict resolution. In some cases this means dealing with it at the application level.

I don't know it well, but it's my understanding that MemcacheDB is really just memcached with disk (BDB, IIRC) in place of memory on the back end. This has been done a few different times and in a few different ways. Topology changes are the killers here. Consistent hashing can't really help you deal with changes in this kind of deployment.

- Matt

Reply via email to