Looking at database backends for the data we’re talking about storing, these 
are the ranked concerns that I have heard:

1. availability
2. performance
3. consistency and partition tolerance

Basically, if the service is up, that’s good.  The data that we are hold here 
is either very short lived (that related to the push) or easily repaired 
(device registrations of push URLs).

It’s a little more difficult when we get to revocation of URIs, or storing of 
state for URIs, but I’m going to leave that for a later discussion.

The main concern that I have here is service availability in light of data 
centre failures.  In my experience, this happens far more often than is 
acceptable.  For reference, when I was at Skype, Microsoft had numbers that 
were pretty close to their promise of 99.9% availability.

But I need to emphasise this - 99.9% is not good enough for a whole system.  
99.99% might be.  And the cloud platform is a component of a system, the 
overall availability will be lower.

So geographic distribution of data is critical.  This is managed with varying 
degrees of sophistication by the different storage systems.  The biggest part 
is how they trade off responsiveness and fault recovery.  If updating a row 
requires a cross-geography request, that’s going to be slow, but it means that 
you get very good failure characteristics.  At the same time, it can also mean 
that you are unable to perform updates in certain types of failure scenario.

The key feature - one that most databases provide - is the ability to tune this 
to application needs.  We’re going to want to tune this initially so that 
updates don’t depend on full replication, since we care more about availability 
and performance than consistency and partition tolerance.

Run-down of options

Now, I have only a small amount of background with these specific items 
unfortunately, my experience is with stuff that I believe some of you might 
think to be poisonous, plus some stuff we just can’t use.  Nonetheless, here 
are what I believe to be the high runner options:

MongoDB

This is very widely used, readily deployed to the cloud platform of your choice 
and it has a pretty good story when it comes to performance.

Mongo does have a geographic redundancy story.  It’s not covered in glory, but 
nor is it entirely embarrassing.

I’m less encouraged by the characteristics of the geographic redundancy 
features; replica sets can only be statically configured to prefer local 
communication, which means that a failure in the local cluster will result in 
cross-geo operations for all requests that block on replication.  I’m also a 
little concerned about how placement is controlled within a cluster.

http://docs.mongodb.org/manual/core/replica-set-architectures/

Cassandra

Cassandra is also widely used, with similar characteristics to Mongo.

This has a larger number of options with respect to geographic redundancy.  The 
topology aware replication mode offers some pretty good opportunities, 
particularly when deployed with a “snitch” that is aware of the deployment 
layout.

http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architectureDataDistributeReplication_c.html

Redis

Redis is probably the simplest option here.  Its high availability option is 
still unstable, so I’m not going to recommend it.

Dynamo

This would tie us to AWS, but it seems like a capable DB.  The problem is that 
in the short time I looked, I couldn’t uncover any details on their geographic 
redundancy story.  That is not encouraging.

Roll your own geo redundancy

This remains a viable option…if you really need the 
performance/availability/other characteristic.  Typically you take an existing 
store (pick one, any one) and you add your own brand of geographic redundancy 
to suit your needs.  This has the advantage of being exactly what you need, but 
the disadvantage of it being a bunch of extra work.  I’m not going to recommend 
this either; but it’s an option that may become worth considering later, unless 
our database friend really pick up their collective game.

Others

I could provide info on memcached(b), Azure, Riak, etc…  All of which have 
their merits, but none of which are really strong contenders for the crown.

Summary

I think that we could make either Mongo or Cassandra work for us.  If we were 
truly serious about storing hard state, then I think Cassandra offers more 
control, but I think that it would be a harder challenge from an operational 
perspective to use and deploy.

At this stage, I’m going to suggest that we pick Mongo, even though I think 
Cassandra might be functionally superior.

As long as we create and maintain a good abstraction for our data store, I 
think that we can switch if needs change.  …always create and maintain a good 
abstraction for stuff like this.
_______________________________________________
dev-media mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-media

Reply via email to