Dear Erick, Thanx for your information.
Manjula. On Tue, Nov 30, 2010 at 6:37 PM, Erick Erickson <erickerick...@gmail.com>wrote: > WhitespaceAnalyzer does just that, splits the incoming stream on > white space. > > From the javadocs for StandardAnalyzer: > > A grammar-based tokenizer constructed with JFlex > > This should be a good tokenizer for most European-language documents: > > - Splits words at punctuation characters, removing punctuation. However, > a dot that's not followed by whitespace is considered part of a token. > - Splits words at hyphens, unless there's a number in the token, in which > case the whole token is interpreted as a product number and is not split. > - Recognizes email addresses and internet hostnames as one token. > > Many applications have specific tokenizer needs. If this tokenizer does not > suit your application, please consider copying this source code directory > to > your project and maintaining your own grammar-based tokenizer. > > > Best > > Erick > > On Tue, Nov 30, 2010 at 12:06 AM, manjula wijewickrema > <manjul...@gmail.com>wrote: > > > Hi Steve, > > > > Thanx a lot for your reply. Yes there are only two classes and it's > corrcet > > that the way you have realized the problem. As you have instructed, I > > checked WhitespaceAnalyzer for querying (instead of StandardAnalyzer) and > > it > > seems to me that it gives better results rather than StandardAnalyzer. So > > could you please let me know what are the differences between > > StandardAnalyzer and WhitespaceAnalyzer. I highly appriciate your > response. > > Thanx. > > > > Manjula. > > > > > > On Mon, Nov 29, 2010 at 7:32 PM, Steven A Rowe <sar...@syr.edu> wrote: > > > > > Hi Manjula, > > > > > > It's not terribly clear what you're doing here - I got lost in your > > > description of your (two? or maybe four?) classes. Sometimes things > are > > > easier to understand if you provide more concrete detail. > > > > > > I suspect that you could benefit from reading the book Lucene in > Action, > > > 2nd edition: > > > > > > http://www.manning.com/hatcher3/ > > > > > > You would also likely benefit from using Luke, the Lucene index > browser, > > to > > > better understand your indexes' contents and debug how queries match > > > documents: > > > > > > http://code.google.com/p/luke/ > > > > > > I think your question is whether you're using Analyzers correctly. It > > > sounds like you are creating two separate indexes (one for each of your > > > classes), and you're using SnowballAnalyzer on the indexing side for > both > > > indexes, and StandardAnalyzer on the query side. > > > > > > The usual advice is to use the same Analyzer on both the query and the > > > index side. But it appears to be the case that you are taking stemmed > > index > > > terms from your index #1 and then querying index #2 using these stemmed > > > terms. If this is true, then you want the query-time analyzer in your > > > second index not to change the query terms. You'll likely get better > > > results using WhitespaceAnalyzer, which tokenizes on whitespace and > does > > no > > > further analysis, rather than StandardAnalyzer. > > > > > > Steve > > > > > > > -----Original Message----- > > > > From: manjula wijewickrema [mailto:manjul...@gmail.com] > > > > Sent: Monday, November 29, 2010 4:32 AM > > > > To: java-user@lucene.apache.org > > > > Subject: Analyzer > > > > > > > > Hi, > > > > > > > > In my work, I am using Lucene and two java classes. In the first one, > I > > > > index a document and in the second one, I try to search the most > > relevant > > > > document for the indexed document in the first one. In the first java > > > > class, > > > > I use the SnowballAnalyzer in the createIndex method and > > StandardAnalyzer > > > > in > > > > the searchIndex method and pass the highest frequency terms into the > > > > second > > > > Java class. In the second class, I use SnowballAnalyzer in the > > > createIndex > > > > method (this index is for the collection of documents to be searched, > > or > > > > it > > > > is my database) and StandardAnalyser in the searchIndex method (I > pass > > > the > > > > highest frequently occuring term of the first class as the search > term > > > > parameter to the searchIndex method of the second class). Using > > Analyzers > > > > in > > > > this manner, what I am willing is to do the stemming, stop-words in > > both > > > > indexes (in both classes) and to search those a few high frequency > > words > > > > (of > > > > the first index) in the second index. So, if my intention is clear to > > > you, > > > > could you please let me know whether it is correct or not the way I > > have > > > > used Analyzers? I highly appreciate any comment. > > > > > > > > Thanx. > > > > Manjula. > > > > > >