On 9/22/10 10:23 AM, Les Mikesell wrote:
On 9/22/2010 11:59 AM, Matt Ingenthron wrote:
On 9/22/10 6:12 AM, ligerdave wrote:
MongoDB is actually "cached" db, meaning that, most of its records are
in memory.
I think there is also a memcached and DB hybrid which comes w/ a
persistent option. i think it's called memcachedDB, which runs a in-
memory db(like mongodb). this shares most of common api w/ memcached
so you dont have to change code very much
membase is compatible with memcached protocol, has a 20MByte default
object size limit, lets you define memory and disk usage across nodes in
different "buckets".
memcacheDB is challenging to deploy for a few reasons, one of which is
that the topology is fixed at deployment time.
Does anyone know how these would compare to 'riak', a distributed
database that can do redundancy with some fault tolerance and knows
how to rebalance the storage across nodes when they are added or
removed? (Other than the different client interface...).
This is a very detailed question, but...
Without going too much into advocacy (I'd defer you to the membase
list/site), membase does have redundancy, fault tolerance and can
rebalance when nodes are added and removed. The interface to membase is
memcached protocol. It does so by making sure there is an authoritative
place for any given piece of data at any given point in time. That
doesn't mean data's not replicated or persisted, just that there are
rules about the state changes for a given piece of data based on vbucket
hashing and a shared configuration.
This was actually inspired by similar concepts that in memcached's
codebase up through the early 1.2.x, but not in use in anywhere that I'm
familiar with.
riak is more designed around eventually consistent and lots of tuning
W+R>N, meaning that it is designed more to always take writes and deal
with consistency for reads by doing multiple reads. This is different
than memcached in that memcached expects one and only one location for a
given piece of data with a given topology. If the topology changes
(node failures, additions), things like consistent hashing dictate a new
place, but there aren't multiple places to write to.
Any time you accept concurrent writes in more than one place, you have
to deal with conflict resolution. In some cases this means dealing with
it at the application level.
I don't know it well, but it's my understanding that MemcacheDB is
really just memcached with disk (BDB, IIRC) in place of memory on the
back end. This has been done a few different times and in a few
different ways. Topology changes are the killers here. Consistent
hashing can't really help you deal with changes in this kind of deployment.
- Matt