I'm not aware of anyone classifying what twitter is doing today as 'working.' In fact, I believe that twitter's problems are much larger than just technology but that's a whole different subject.

What twitter may have realized is that they don't have the resources of Facebook, that Facebook's use case is fairly limited (although a large deployment), and that they may have been trudging off into the great unknown.

Although I'm a fan of Cassandra, there's no way I'd use it today for my tier 1 deployments, because I don't have the resources of Facebook, and even though Cassandra is open source, that doesn't mean I can fix it when it goes down. And, because it's open source, there's no one to call to have it fixed reliably and within production constraints. Cassandra's strength is its greatest weakness right now.

The bloom is starting to come off NoSQL, which is normal - it means that people & firms are trying to do more with it and most probably realizing that all of the tools, support, infrastructure, etc. surrounding alternative solutions isn't such a bad thing. And that the world of NoSQL had start to come up with a better mantra than "joins are bad, dude", and "you're just protecting the status quo." There's a *lot more* big data wrapped up inside of SQL databases and only a fraction of the in NoSQL - and there's a lot of reasons for it.

For example, do I *really* need Cassandra if MySQL will work for me and I just want to get up and running quickly without writing a bunch of code? My team was pushing greater than 20k updates per second into, GASP, Oracle 5 years ago. Sure, it was expensive. But it worked. And it was worth it - or we wouldn't have spent the $$. What's your data worth if you don't have your data? zero.

And then there's support - internal support. Picking a database du-jour is organizationally expensive. Especially when there's probably one or two databases that Twitter could have bought off the shelf that would have solved their problems. But instead of bolstering the reliability and robustness of their internal architecture, they've gone and used very expensive equity for acquisitions. Running multiple databases in a fault tolerant, geographically disperse deployment isn't easy (yes, I've done it) and having multiple databases in the mix really complicates things. And at this stage in Twitter's growth, I frankly don't understand why they're looking to complicate their technological landscape any more than absolutely required.

So, this entire rant can be summarized really quite succinctly:

"If data is your business (like Facebook & Twitter), if you don't have the resources to cost effectively handle all of your data management needs internally (Facebook does, Twitter doesn't), then basing your solution on un-proven storage solutions (commercial or open source, SQL or NoSQL) is a risky and short sighted strategy."

Please send death threats via the channels iterated below:


Colin
+1 315 886 3422 cell
+1 701 212 4314 office
http://blog.cloudeventprocessing.com
http://twitter.com/EventCloudPro <http://twitter.com/EventCloudPro%20>

On 7/10/2010 2:02 PM, Ryan King wrote:
On Sat, Jul 10, 2010 at 10:33 AM, Marty Greenia<martygree...@gmail.com>  wrote:
It almost seems counter-intuitive. For analytics, you'd think they'd want a
database that supports more sophisticated query functionality (sql). Whereas
for everyday tweet storage, something fast and high-throughput (cassandra)
makes sense.

I'd be curious to here the details as well.
These decisions aren't made in a vacuum. One of these use cases has an
existing system that works, one doesn't.

-ryan

Reply via email to