Re: frequent keyword computation within a search ( and timeinterval )

Jason Rutherglen Thu, 05 Jan 2012 16:24:00 -0800

> Short answer is that no, there isn't an aggregate
> function. And you shouldn't even try


If that is the case why does a 'stats' component exist for Solr with
the SUM function built in?

http://wiki.apache.org/solr/StatsComponent

On Thu, Jan 5, 2012 at 1:37 PM, Erick Erickson <[email protected]> wrote:
> You will encounter endless grief until you stop
> thinking of Solr/Lucene as a replacement for
> an RDBMS. It is a *text search engine*.
> Whenever you start asking "how do I implement
> a SQL statement in Solr", you have to stop
> and reconsider *why* you are trying to do that.
> Then recast the question in terms of searching.
>
> Short answer is that no, there isn't an aggregate
> function. And you shouldn't even try.
>
> Best
> Erick
>
> On Thu, Jan 5, 2012 at 12:53 PM, prasenjit mukherjee
> <[email protected]> wrote:
>> Thanks Eric for the response.
>>
>> Will lucene/solr provide me aggregations ( of field vaues ) satisying
>> a query criteria ? e.g. SELECT SUM(price) WHERE item=fruits
>>
>> Or I need to use hitCollector to achieve that ?
>>
>> Any sample solr/lucene query to compte aggregates ( like SUM ) will be great.
>>
>> -Thanks,
>> Prasenjit
>>
>> On Thu, Jan 5, 2012 at 7:10 PM, Erick Erickson <[email protected]> 
>> wrote:
>>> the time interval is just a RangeQuery in the Lucene
>>> world. The rest is pretty standard search stuff.
>>>
>>> You probably want to have a look at the NRT
>>> (near real time) stuff in trunk.
>>>
>>> Your reads/writes are pretty high, so you'll need
>>> some experimentation to size your site
>>> correctly.
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Jan 4, 2012 at 12:17 AM, prasenjit mukherjee
>>> <[email protected]> wrote:
>>>> I have a requirement where reads and writes are quite high ( @ 100-500
>>>> per-sec ). A document has the following fields : timestamp,
>>>> unique-docid,  content-text, keyword. Average content-text length is ~
>>>> 20 bytes, there is only 1 keyword for a given docid.
>>>>
>>>> At runtime, given a query-term ( which could be null ) and a
>>>> time-interval,  I need to find out top-k frequent keywords which
>>>> contains the query-term ( optional if its null )  in its context-text
>>>> field within that time-interval. I can purge the data every day, hence
>>>> no need for me to have more than a days data.
>>>>
>>>> I have quite a few options here : Starting with MySQL, NoSQLs (
>>>> Cassandra, Mongo, Couch, Riak, Redis ) , Search-Engine based (
>>>> lucene/solr ) each having its own pros/cons.
>>>>
>>>> In MySQL we can achieve this via : GROUP-BY/COUNT  clause
>>>> In NoSQL I can probably write a map/reduce task to query these
>>>> numbers. Although I am not very sure about the query response time.
>>>> Not sure of we can achieve it via lucene/solr OOB.
>>>>
>>>> Any suggestions on what would be a good choice for this use case ?
>>>>
>>>> -Thanks,
>>>> prasenjit
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: frequent keyword computation within a search ( and timeinterval )

Reply via email to