Re: frequent keyword computation within a search ( and timeinterval )

Erick Erickson Thu, 05 Jan 2012 05:41:20 -0800

the time interval is just a RangeQuery in the Lucene
world. The rest is pretty standard search stuff.


You probably want to have a look at the NRT
(near real time) stuff in trunk.

Your reads/writes are pretty high, so you'll need
some experimentation to size your site
correctly.

Best
Erick

On Wed, Jan 4, 2012 at 12:17 AM, prasenjit mukherjee
<[email protected]> wrote:
> I have a requirement where reads and writes are quite high ( @ 100-500
> per-sec ). A document has the following fields : timestamp,
> unique-docid,  content-text, keyword. Average content-text length is ~
> 20 bytes, there is only 1 keyword for a given docid.
>
> At runtime, given a query-term ( which could be null ) and a
> time-interval,  I need to find out top-k frequent keywords which
> contains the query-term ( optional if its null )  in its context-text
> field within that time-interval. I can purge the data every day, hence
> no need for me to have more than a days data.
>
> I have quite a few options here : Starting with MySQL, NoSQLs (
> Cassandra, Mongo, Couch, Riak, Redis ) , Search-Engine based (
> lucene/solr ) each having its own pros/cons.
>
> In MySQL we can achieve this via : GROUP-BY/COUNT  clause
> In NoSQL I can probably write a map/reduce task to query these
> numbers. Although I am not very sure about the query response time.
> Not sure of we can achieve it via lucene/solr OOB.
>
> Any suggestions on what would be a good choice for this use case ?
>
> -Thanks,
> prasenjit
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: frequent keyword computation within a search ( and timeinterval )

Reply via email to