RE: Can I do "Google Suggest" Like Search? - - - from - - -vikas

Vikas Khengare Wed, 24 May 2006 03:27:45 -0700

Hi Mark

You are right; I want suggestions from doc content only not general words. What will happen if I send PrefixQuery in each char input from user then I will get results [No problem about number of hits to show user] using AJAX. So when user type "a" Onkeyup I will send query through AJAX to search engine with prefixquery then I will get results.

e.g. Field("Country","America")

Field("Country","Africa")

Field("Country","Aegentina")

So If search in "Country" for "a*" it will return me all values which are starting from "a" So I will get results as I want.

Is this one right?

Or What is other way to do so?

-----Original Message-----
From: mark harwood [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 24, 2006 3:37 PM
To: java-user@lucene.apache.org
Subject: Re: Can I do "Google Suggest" Like Search? - - - from - - -vikas

Tips:

1) Don't send to 3 mail lists when 1 will do please

continue this conversation on java-user only.

2) Most "suggest" tools work off an index of previous

searches (not documents). Do you have a large set of

searches? If not, making sensible suggestions based on

document content can be much more compute intensive.

My assumption here is you are having to work with doc

content.

3) You don't need to go to the expense of running a

query and ranking and scoring documents - look at the

lower level APIs terms() and termDocs() - use them to

find the matching terms

4) word suggestions ideally shouldn't be independent

of each other - look at completed words in the query

string and use them to inform the selection of

suggestions for the incomplete term being typed. The

termDocs()/termPositions() apis give you all the data

you need to establish what docs/positions exist for

completed terms and these can be cross-referenced with

the list of docs/positions for the "alternative" terms

under consideration. A high proximity between

completed term occurences and a suggested term's

occurences makes a strong candidate. A fast way to do

proximity tests might be to compared sorted arrays of

numbers where each number represents a term using a

function like:

termspaceNumber=[DocNumber * maxNumTermsPerDoc]+

termPositionInDoc

You could then compare long[]completedTermOccurences

with long[]suggestedAlternativeTermOccurences looking

for matches where numbers differ by 1 or 2.

A faster (rougher) comparison solution which ignored

word proximity would be just to compare bitsets of doc

ids looking for high levels of

overlap(intersection/union).

You can use TermEnum.docFreq() to quickly rule out

very rare words from your calculations.

Cheers,

Mark

Send instant messages to your online friends http://uk.messenger.yahoo.com

---------------------------------------------------------------------

To unsubscribe, e-mail: [EMAIL PROTECTED]

For additional commands, e-mail: [EMAIL PROTECTED]

==================================================================================================

with best regards

from .........

vikas r. khengare

Veritas Software India Private Ltd.

Symantec Corporation

Pune, India

[ Enjoy your life today.... because yesterday had gone.... and tommorow may never come . ]

RE: Can I do "Google Suggest" Like Search? - - - from - - -vikas

Reply via email to