On Tue, Jan 12, 2010 at 10:14 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:

> Jake,
>
> I wonder how often people need reliable transactions for
> realtime search? Maybe Mysql's t-log could be used sans the
> database part?
>

A reliable message queue - I'd imagine all the time!  Transactions... that
depends on how
"ACID" you care about.  For social media, news, log monitoring, you can miss
some events,
and transactionality isn't necessarily the key part - just the ability to
replay from some
point in time, so you can elastically replicate, and handle server crashes
(as well as
doing background batch indexing followed with incremental realtime catchup).


> The created_at column for near realtime seems like it could hurt
> the database due to excessive polling? Has anyone tried it yet?
>

I haven't tried it in a production system, but in testing it only seemed
only to be bad if
you have a single, not replicated or sharded DB, but a fully replicated
search system
(so having no separate message queue would entail that you're spamming your
db with
polling queries).  If your ratio of search shards to db shards isn't too
high (like say, the
non-distributed case where you have a handful of replica indexes talking to
one
centralized db), then this wouldn't be as much of a problem unless you go
crazy and
query every couple of milliseconds.


> > I wrote up a simple file-based indexing event log in an
> afternoon
>
> Right, however it's probably a long perilous leap from this to a t-log
> that's production ready.
>

Certainly.

I'm waiting for someone to dive in and mess with Bookkeeper
> http://wiki.apache.org/hadoop/BookKeeper and report back!
>

That could be great for this kind of thing.  Add some custom adapters
to write ledger entries which are easily translatable entries into Lucene
Documents, and that would hook right into Zoie rather easily.
BookKeeperStreamDataProvider!

  -jake

Reply via email to