On Tue, Jan 12, 2010 at 10:14 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote:
> Jake, > > I wonder how often people need reliable transactions for > realtime search? Maybe Mysql's t-log could be used sans the > database part? > A reliable message queue - I'd imagine all the time! Transactions... that depends on how "ACID" you care about. For social media, news, log monitoring, you can miss some events, and transactionality isn't necessarily the key part - just the ability to replay from some point in time, so you can elastically replicate, and handle server crashes (as well as doing background batch indexing followed with incremental realtime catchup). > The created_at column for near realtime seems like it could hurt > the database due to excessive polling? Has anyone tried it yet? > I haven't tried it in a production system, but in testing it only seemed only to be bad if you have a single, not replicated or sharded DB, but a fully replicated search system (so having no separate message queue would entail that you're spamming your db with polling queries). If your ratio of search shards to db shards isn't too high (like say, the non-distributed case where you have a handful of replica indexes talking to one centralized db), then this wouldn't be as much of a problem unless you go crazy and query every couple of milliseconds. > > I wrote up a simple file-based indexing event log in an > afternoon > > Right, however it's probably a long perilous leap from this to a t-log > that's production ready. > Certainly. I'm waiting for someone to dive in and mess with Bookkeeper > http://wiki.apache.org/hadoop/BookKeeper and report back! > That could be great for this kind of thing. Add some custom adapters to write ledger entries which are easily translatable entries into Lucene Documents, and that would hook right into Zoie rather easily. BookKeeperStreamDataProvider! -jake