Hi Jay, Sorry, lapsus calami, that would be Lucene *contrib*. Have a look: http://lucene.apache.org/java/2_3_1/api/contrib-analyzers/index.html
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: Jay <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, March 25, 2008 6:15:54 PM Subject: Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1 Sorry, I could not find the filter in the 2.3 API class list (core + contrib + test). I am not ware of lucene config file either. Could you please tell me where it is in 2.3 release? Thanks! Jay Otis Gospodnetic wrote: > Jay, > > Have a look at Lucene config, it's all there, including tests. This filter > will take a token such as "foobar" and chop it up into n-grams (e.g. foobar > -> fo oo ob ba ar would be a set of bi-grams). You can specify the n-gram > size and even min and max n-gram size. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: Jay <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, March 25, 2008 1:32:24 PM > Subject: Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1 > > Hi Uwe, > > I am curious what NGramStemFilter is? Is it a combination of porter > stemming and word ngram identification? > > Thanks! > > Jay > > Uwe Goetzke wrote: >> Hi Ivan, >> No, we do not use StandardAnalyser or StandardTokenizer. >> >> Most data is processed by >> fTextTokenStream = result = new >> org.apache.lucene.analysis.WhitespaceTokenizer(reader); >> result = new ISOLatin2AccentFilter(result); // ISOLatin1AccentFilter >> modified that ö -> oe >> result = new org.apache.lucene.analysis.LowerCaseFilter(result); >> result = new org.apache.lucene.analysis.NGramStemFilter(result,2); >> //just a bigram tokenizer >> >> We use our own queryparser. The bigramms are searched with a tolerant phrase >> query, scoring in a doc the greatest bigramms clusters covering the phrase >> token. >> >> Best Regards >> >> Uwe >> >> -----Ursprüngliche Nachricht----- >> Von: Ivan Vasilev [mailto:[EMAIL PROTECTED] >> Gesendet: Freitag, 21. März 2008 16:25 >> An: java-user@lucene.apache.org >> Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1 >> >> Hi Uwe, >> >> Could you tell what Analyzer do you use when you marked so big indexing >> speedup? >> If you use StandardAnalyzer (that uses StandardTokenizer) may be the >> reason is in it. You can see the pre last report in the thread "Indexing >> Speed: 2.3 vs 2.2 (real world numbers)". According to the reporter Jake >> Mannix this is because now StandardTokenizer uses StandardTokenizerImpl >> that now is generated by JFlex instead of JavaCC. >> I am asking because I noticed a great speedup in adding documents to >> index in our system. We have time control on this in the debug mode. NOW >> THEY ARE ADDED 5 TIMES FASTER!!! >> But in the same time the total process of indexing in our case has >> improvement of about 8%. As our system is very big and complex I am >> wondering if really the whole process of indexing is reduces so >> remarkably and our system causes this slowdown or may be Lucene does >> some optimizations on the index, merges or something else and this is >> the reason the total process of indexing to be not so reasonably faster. >> >> Best Regards, >> Ivan >> >> >> >> Uwe Goetzke wrote: >>> This week I switched the lucene library version on one customer system. >>> The indexing speed went down from 46m32s to 16m20s for the complete task >>> including optimisation. Great Job! >>> We index product catalogs from several suppliers, in this case around >>> 56.000 product groups and 360.000 products including descriptions were >>> indexed. >>> >>> Regards >>> >>> Uwe >>> >>> >>> >>> ----------------------------------------------------------------------- >>> Healy Hudson GmbH - D-55252 Mainz Kastel >>> Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076 >>> >>> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger >>> sind, durfen Sie die Informationen nicht offen legen oder benutzen. Wenn >>> Sie diese Email durch einen Fehler bekommen haben, teilen Sie uns dies >>> bitte umgehend mit, indem Sie diese Email an den Absender zuruckschicken. >>> Bitte loschen Sie danach diese Email. >>> This email is confidential. If you are not the intended recipient, you must >>> not disclose or use this information contained in it. If you have received >>> this email in error please tell us immediately by return email and delete >>> the document. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> __________ NOD32 2913 (20080301) Information __________ >>> >>> This message was checked by NOD32 antivirus system. >>> http://www.eset.com >>> >>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> ----------------------------------------------------------------------- >> Healy Hudson GmbH - D-55252 Mainz Kastel >> Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076 >> >> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfänger >> sind, dürfen Sie die Informationen nicht offen legen oder benutzen. Wenn Sie >> diese Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte >> umgehend mit, indem Sie diese Email an den Absender zurückschicken. Bitte >> löschen Sie danach diese Email. >> This email is confidential. If you are not the intended recipient, you must >> not disclose or use this information contained in it. If you have received >> this email in error please tell us immediately by return email and delete >> the document. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]