The thing is, the NoSQL stuff is pretty much just a key-value store.
There's generally no way to "query" the store, instead you can simply
look up a document by ID.
If this meets the needs of your application, all you need is a key-value
store, and not any kind of query, then it's definitely going to be a lot
less overhead than an actual SQL rdbms, and simpler to manage, with
advantages for scalability and replication etc. The reason it's simpler
and more performant, is well, because it's _simpler_, you don't actually
have querrying or joining abilities.
But if you are actually going to need querrying on values other than
ID... SQL rdbms is a pretty standardized, well understood way to do
this. There are certainly other ways -- you could combine a "noSQL"
key-value store with Solr/Lucene, for instance. Which in some cases may
get you even better performance and more flexiblity than an rdbms
solution. But it's (IMO) going to be a bit harder to set up and manage
and use in your favorite development environment, precisely because
rdbms is such a time-tested standardized mature approach.
So, as usual, the right tool for the job. If all you really need is a
key-value store on ID, then a "NoSQL" solution may be the right thing.
But if you need actual querrying and joining, then personally I'd stick
with rdbms unless I had some concrete reason to think a more complicated
"nosql"+solr solution was required. Certainly if you are planning on
using Solr _anyway_ because your application is a search engine of some
type, that would lessen the incremental 'cost' of a nosql+solr solution.
[ Note that if all you want is a "schemaless" storage, you CAN just
stick large chunks of binary or text in an rdbms 'blob' or 'text'
column. You won't be able to efficiently search on these -- but you
aren't able to efficiently search in a 'nosql' solution either. So you
_can_ use an rdbms like a "nosql" solution to store arbitrary data, no
problem. If you're using an rdbms, you can have _other_ columns in
addition to your blob/text one, that you can populate for select and
join. If you _aren't_ going to need those -- then there's be no reason
to do it in an rdbms (even though you could), you would indeed then just
want to use a 'nosql' key-value store solution which will be higher
performance. So the conclusion again I think is that rdbms is _more
powerful_ than nosql, but that power comes with a performance cost. If
you don't need it, nosql. If you do need it -- there's no reason you
can't store "structureless" units of data in text/blob in an rdbms too. ]
Peter Schlumpf wrote:
I'd opt for the first response. I hope NoSQL is not flash in the pan. It
makes eminent sense to me. SQL is just one way of looking at data. A level of
abstraction. What authority says that SQL is the only or the best way of
looking at a dataset? Or the MARC record format for that matter? They
certainly weren't inscribed on stone tablets. These things can become mind
prisons. I think it's refreshing that there are those willing to look at
databases beyond SQL.
Peter Schlumpf
www.avantilibrarysystems.com
-----Original Message-----
From: Thomas Dowling <[email protected]>
Sent: Apr 12, 2010 10:55 AM
To: [email protected]
Subject: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?
So let's say (hypothetically, of course) that a colleague tells you he's
considering a NoSQL database like MongoDB or CouchDB, to store a couple
tens of millions of "documents", where a document is pretty much an
article citation, abstract, and the location of full text (not the full
text itself). Would your reaction be:
"That's a sensible, forward-looking approach. Lots of sites are putting
lots of data into these databases and they'll only get better."
"This guy's on the bleeding edge. Personally, I'd hold off, but it could
work."
"Schedule that 2012 re-migration to Oracle or Postgres now."
"Bwahahahah!!!"
Or something else?
(<http://en.wikipedia.org/wiki/NoSQL> is a good jumping-in point.)
--
Thomas Dowling
[email protected]