Uwe, This is a little off thread-topic, but I was wondering how your search relevance and search performance has fared with this bigram-based index. Is it significantly better than before you use the NGramAnalyzer? -jake
On 3/24/08, Uwe Goetzke <[EMAIL PROTECTED]> wrote: > Hi Ivan, > No, we do not use StandardAnalyser or StandardTokenizer. > > Most data is processed by > fTextTokenStream = result = new > org.apache.lucene.analysis.WhitespaceTokenizer(reader); > result = new ISOLatin2AccentFilter(result); // ISOLatin1AccentFilter > modified that ö -> oe > result = new org.apache.lucene.analysis.LowerCaseFilter(result); > result = new org.apache.lucene.analysis.NGramStemFilter(result,2); > //just a > bigram tokenizer > > We use our own queryparser. The bigramms are searched with a tolerant phrase > query, scoring in a doc the greatest bigramms clusters covering the phrase > token. > > Best Regards > > Uwe > > -----Ursprüngliche Nachricht----- > Von: Ivan Vasilev [mailto:[EMAIL PROTECTED] > Gesendet: Freitag, 21. März 2008 16:25 > An: java-user@lucene.apache.org > Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1 > > Hi Uwe, > > Could you tell what Analyzer do you use when you marked so big indexing > speedup? > If you use StandardAnalyzer (that uses StandardTokenizer) may be the > reason is in it. You can see the pre last report in the thread "Indexing > Speed: 2.3 vs 2.2 (real world numbers)". According to the reporter Jake > Mannix this is because now StandardTokenizer uses StandardTokenizerImpl > that now is generated by JFlex instead of JavaCC. > I am asking because I noticed a great speedup in adding documents to > index in our system. We have time control on this in the debug mode. NOW > THEY ARE ADDED 5 TIMES FASTER!!! > But in the same time the total process of indexing in our case has > improvement of about 8%. As our system is very big and complex I am > wondering if really the whole process of indexing is reduces so > remarkably and our system causes this slowdown or may be Lucene does > some optimizations on the index, merges or something else and this is > the reason the total process of indexing to be not so reasonably faster. > > Best Regards, > Ivan > > > > Uwe Goetzke wrote: > > This week I switched the lucene library version on one customer system. > > The indexing speed went down from 46m32s to 16m20s for the complete task > > including optimisation. Great Job! > > We index product catalogs from several suppliers, in this case around > > 56.000 product groups and 360.000 products including descriptions were > > indexed. > > > > Regards > > > > Uwe > > > > > > > > ----------------------------------------------------------------------- > > Healy Hudson GmbH - D-55252 Mainz Kastel > > Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076 > > > > Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger > sind, durfen Sie die Informationen nicht offen legen oder benutzen. Wenn Sie > diese Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte > umgehend mit, indem Sie diese Email an den Absender zuruckschicken. Bitte > loschen Sie danach diese Email. > > This email is confidential. If you are not the intended recipient, you > must not disclose or use this information contained in it. If you have > received this email in error please tell us immediately by return email and > delete the document. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > __________ NOD32 2913 (20080301) Information __________ > > > > This message was checked by NOD32 antivirus system. > > http://www.eset.com > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > ----------------------------------------------------------------------- > Healy Hudson GmbH - D-55252 Mainz Kastel > Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076 > > Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfänger > sind, dürfen Sie die Informationen nicht offen legen oder benutzen. Wenn Sie > diese Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte > umgehend mit, indem Sie diese Email an den Absender zurückschicken. Bitte > löschen Sie danach diese Email. > This email is confidential. If you are not the intended recipient, you must > not disclose or use this information contained in it. If you have received > this email in error please tell us immediately by return email and delete > the document. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Sent from Gmail for mobile | mobile.google.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]