On Thu, Jan 7, 2010 at 10:43 AM, Nathan McCall <n...@vervewireless.com> wrote:
> Agreed that there is not much to go on here in the original question.
> I will say that we very recently found a good fit with Solr and
> Cassandra in how we deal with a very heavy write volume of news
> article data. Cassandra is excellent with write throughput and high
> availability, but our search use cases are with time-dependent news
> content, so we need lots of term proximity, faceting and ordering
> functionality.
>
> We probably could store everything in Solr, but the above approach
...

I think that in many (most?) cases, optimal solutions for searching
and lookups are different.

Traditionally this has meant that instead of trying to cram everything
in Oracle (or MySQL, Postgres) with its in-built
not-quite-as-good-as-Lucene text indexer, do the right thing and use
both: DB for storing data, for lookups, aggregates; and search index
for full-text searches. For some reason it seems very unintuitive
notion to use two tools instead of one, when they have different sweet
spots.
And going forward, similar trade-offs are needed between 'traditional'
RDBMSs, newer distributed high-availability eventual consistent data
stores (with multiple variation from simple-lookup to sorted access),
search index processing, and batch-oriented processing (Hadoop /
map/reduce).
Trying to do too many things using just one kind of tool tends to lead
to scalability and maintenance problems.

I am actually trying to decide on similar case which tools (from loose
set of Cassandra, Lucene/Solr, Voldemort) to use to handle processing
of large amounts of data, and I'm pretty sure I will end up using more
than just one.

-+ Tatu +-

Reply via email to