the time interval is just a RangeQuery in the Lucene world. The rest is pretty standard search stuff.
You probably want to have a look at the NRT (near real time) stuff in trunk. Your reads/writes are pretty high, so you'll need some experimentation to size your site correctly. Best Erick On Wed, Jan 4, 2012 at 12:17 AM, prasenjit mukherjee <prasen....@gmail.com> wrote: > I have a requirement where reads and writes are quite high ( @ 100-500 > per-sec ). A document has the following fields : timestamp, > unique-docid, content-text, keyword. Average content-text length is ~ > 20 bytes, there is only 1 keyword for a given docid. > > At runtime, given a query-term ( which could be null ) and a > time-interval, I need to find out top-k frequent keywords which > contains the query-term ( optional if its null ) in its context-text > field within that time-interval. I can purge the data every day, hence > no need for me to have more than a days data. > > I have quite a few options here : Starting with MySQL, NoSQLs ( > Cassandra, Mongo, Couch, Riak, Redis ) , Search-Engine based ( > lucene/solr ) each having its own pros/cons. > > In MySQL we can achieve this via : GROUP-BY/COUNT clause > In NoSQL I can probably write a map/reduce task to query these > numbers. Although I am not very sure about the query response time. > Not sure of we can achieve it via lucene/solr OOB. > > Any suggestions on what would be a good choice for this use case ? > > -Thanks, > prasenjit > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org