Re: Performance issue

mittals Tue, 03 Feb 2009 03:47:04 -0800

Hi,

All of the fields are text. We have 3 @IndexedEmbedded object with in a
object.
users can search for any string, few of the cases are morgan, john, orcl,
healthcare, pa ma, ma fin.


here is e.g. of our one query:

(+(trading.tn:pa^150.0 lastName:pa*^45.0 firstName:pa*^30.0 roleDesc:pa^3.0
roleShortDesc:pa^3.0 phoneDesc:pa^9.0 cityDesc:pa^3.0) +(trading.tn:ma^150.0
lastName:ma*^45.0 firstName:ma*^30.0 roleDesc:ma^3.0 roleShortDesc:ma^3.0
phoneDesc:ma^9.0 cityDesc:ma^3.0))

another thing is how lucene merge the result of 2 terms. e.g. in above case
we have an AND operator, so lucene merge result of pa & ma terms.

Regards,
Sourabh


Matthew Hall-7 wrote:
> 
> Do you NEED to be using 7 fields here?
> 
> Like Erick said, if you could give us an example of the types of data 
> you are trying to search against, it would be quite helpful.
> 
> Its possible that you might be able to say collapse your 7 fields down 
> to a single field, which would likely reduce the overall number of or 
> clauses in your searches, speeding things up nicely.
> 
> At my project we search two letter prefix searches in sub seconds, for 
> much larger datasets.  Alot of this however is directly due to how our 
> indexes are structured.
> 
> -Matt
> 
> Erick Erickson wrote:
>> Prefix queries are expensive here. The problem is
>> that each one forms a very large OR clause on all
>> the terms that start with those two letters. For instance,
>> if a field in your index contained
>> mine
>> milanta
>> mica
>>
>> a prefix search on "mi" would form
>> mine OR milanta OR mica.
>>
>> Doing this across seven fields could get expensive.
>>
>> Two things:
>> 1> what is the problem you are trying to solve? Perhaps some
>> of the folks on the list can give you some suggestions. You can
>> think about many strategies depending upon what you want
>> to accomplish. A 300M index isn't very big, so you could, for
>> instance, think about indexing a separate field that contains only
>> the two beginning letters and search *that* in this case. I'll
>> assume that three letter prefix queries are OK.
>>
>> 2> How are you measuring query time? If you're measuring the
>> time it takes when you first start a searcher, be aware that the
>> first few queries are usually slow because the caches haven't
>> been filled. Further, are you measuring total response time or
>> are you measuring *just* the query time? It's possible that the
>> time is being spent assembling the response in your code
>> rather than actual searching. You might insert some timers
>> to determine that.
>>
>> Best
>> Erick
>>
>> On Mon, Feb 2, 2009 at 2:58 AM, Mittal, Sourabh (IDEAS) <
>> [email protected]> wrote:
>>
>>   
>>> Hi All,
>>>
>>> We face serious performance issues when users do 2 letter search e.g ho,
>>> jo, pa ma, um ar, ma fi etc. time taken between 10 - 15 secs.
>>> Below is our implementation details:
>>>
>>> 1. Search performs on 7 fields.
>>> 2. PrefixQuery implementation on all fields
>>> 3. AND search.
>>> 4. Our indexer size is 300 MB.
>>> 5. We show only 100 top documents only on the basis of score.
>>> 6. We user StandardAnalyzer & StandardTokenizer for indexing &
>>> searching.
>>> 7. Lucene 2.4
>>> 8. JDK1 .6
>>>
>>> Please suggest me how can we improve the performance.
>>>
>>> Regards,
>>> Sourabh Mittal
>>> Morgan Stanley | IDEAS Practice Areas
>>> Manikchand Ikon | South Wing 18 | Dhole Patil Road
>>> Pune, 411001
>>> Phone: +91 20 2620-7053
>>> [email protected]
>>>
>>>
>>>
>>> --------------------------------------------------------------------------
>>> NOTICE: If received in error, please destroy and notify sender. Sender
>>> does
>>> not intend to waive confidentiality or privilege. Use of this email is
>>> prohibited when received in error.
>>>
>>>     
>>
>>   
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-issue-tp21788313p21808283.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Performance issue

Reply via email to