Almost all documents change daily. So one way I am thinking of updating index is creating new index set in a temp directory by other process. When time comes to update index then renaming in-use dir to archive dir and renaming temp dir to in-use dir. I am not sure whether this is a good idea, what do you say?
I have overall 70000 docs with around 20 fields where 3 fields contain more text. I have indexed and tokenized 11 fields. Average size of data I put in a document is 2 kb.
Thanks
From: Dan Quaroni <[EMAIL PROTECTED]> Reply-To: "Lucene Users List" <[EMAIL PROTECTED]> To: 'Lucene Users List' <[EMAIL PROTECTED]> Subject: RE: Context-based suggestions with spell check Date: Fri, 21 Nov 2003 09:45:13 -0500
There are some important questions to ask before you discount performing all
of those searches. What are your performance requirements? How big is your
index? How are you deploying it?
-----Original Message----- From: sam s [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2003 8:15 PM To: [EMAIL PROTECTED] Subject: RE: Context-based suggestions with spell check
Levenshtein is again word based like spell check. I found Jazzy quite handy to some level for spell check. I am not worrying much about the spell check part. What I want to do is show user right spell check suggestion (when spell check returns multiple suggestions) based on other words he/she entered for the search. Once again going to same example User enters: inted motherboard spell check returns 3 suggestions for inted inter intel intek
In this situation since I have both words intel and motherboard in one document of my search collection I should able to show user something like Did you mean: intel motherboard?
One simplest way to achieve this is do search 3 times for all three suggestions with word motherboard and show user suggestion for which search got more hits. Problem with this is number of iterations involved. If there are suggestions on two words user entered, there will be all kinds of combinations and those many iterations. So I dont want to go this way.
I haven't studied in detail how lucene does indexing and search on it. I don't know whether that will help.
Has anybody come across this problem? Or I must be missing something..
Again, I apologize if you guys think this is not right post for lucene user list.
Thanks, Abhay
>From: "sam s" <[EMAIL PROTECTED]>
>Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>Subject: RE: Context-based suggestions with spell check
>Date: Fri, 21 Nov 2003 00:35:13 +0000
>
>I actually thought of using search for right combination of suggestions but
>I feared of performance degrade. I'll look at levenshtein.
>
>Thanks
>
>>From: Dan Quaroni <[EMAIL PROTECTED]>
>>Reply-To: Lucene Users List <[EMAIL PROTECTED]>
>>To: 'sam s ' <[EMAIL PROTECTED]>
>>Subject: RE: Context-based suggestions with spell check
>>Date: Thu, 20 Nov 2003 19:22:51 -0500
>>
>> I would also suggest 'intend' as a possible correction.
>>
>>There are a decent number of algorithms out there for distance between to
>>words. Check out levenshtein for that.
>>
>>In terms of context based corrections, you could do a search for the word
>>combined with the word in front of it and the word behind it.
>>
>>"I just bought an inted motherboard"
>>
>>Then you do a search for "an inter", "an intel", etc and "inter
>>motherboard", "intel motherboard", etc and count the number of hits you
>>get
>>for each one and rank your suggestions accordingly.
>>
>>
>>-----Original Message-----
>>From: sam s
>>To: [EMAIL PROTECTED]
>>Sent: 11/20/03 7:07 PM
>>Subject: Context-based suggestions with spell check
>>
>>Hi,
>>
>>I am thinking to give spell check functionality to the search. I am
>>trying
>>to achieve two things to complement search.
>>
>>1. Spell check where dictionary will be composed of all text I am
>>creating
>>search index. This looks simple with some spell check implementation.
>>
>>2. The problem I am facing is how do I suggest right suggestion to a
>>wrong
>>word accompanied with other word. For example when user enters search
>>term
>>'inted' spell check returns suggestions inter, intel and intek. Now
>>problem
>>is when user searches 'inted motherboard' how do I decide that user is
>>searching for 'intel motherboard'? Where there are some items contain
>>text
>>'intel motherboard'. How do I make context-based suggestions? Does
>>anybody
>>any simple algorithm for this.
>>I know this is not related to lucene but thought may get some help from
>>community. Suggestions are appreciated.
>>
>>Thanks in advance,
>>Sam
>>
>>_________________________________________________________________
>>Tired of spam? Get advanced junk mail protection with MSN 8.
>>http://join.msn.com/?page=features/junkmail
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>For additional commands, e-mail: [EMAIL PROTECTED]
>>
>
>_________________________________________________________________
>Add photos to your e-mail with MSN 8. Get 2 months FREE*.
>http://join.msn.com/?page=features/featuredemail
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
_________________________________________________________________ STOP MORE SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
_________________________________________________________________
Add photos to your e-mail with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
