Hi Jay,

Sorry, lapsus calami, that would be Lucene *contrib*.
Have a look:
http://lucene.apache.org/java/2_3_1/api/contrib-analyzers/index.html

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Jay <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, March 25, 2008 6:15:54 PM
Subject: Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1

Sorry, I could not find the filter in the 2.3 API class list (core + 
contrib + test). I am not ware of lucene config file either. Could you 
please tell me where it is in 2.3 release?

Thanks!

Jay

Otis Gospodnetic wrote:
> Jay,
> 
> Have a look at Lucene config, it's all there, including tests.  This filter 
> will take a token such as "foobar" and chop it up into n-grams (e.g. foobar 
> -> fo oo ob ba ar would be a set of bi-grams).  You can specify the n-gram 
> size and even min and max n-gram size.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> From: Jay <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Tuesday, March 25, 2008 1:32:24 PM
> Subject: Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1
> 
> Hi Uwe,
> 
> I am curious what NGramStemFilter is? Is it a combination of porter 
> stemming and word ngram identification?
> 
> Thanks!
> 
> Jay
> 
> Uwe Goetzke wrote:
>> Hi Ivan,
>> No, we do not use StandardAnalyser or StandardTokenizer.
>>
>> Most data is processed by 
>>     fTextTokenStream = result = new 
>> org.apache.lucene.analysis.WhitespaceTokenizer(reader);
>>     result = new ISOLatin2AccentFilter(result); // ISOLatin1AccentFilter  
>> modified that ö -> oe
>>     result = new org.apache.lucene.analysis.LowerCaseFilter(result);
>>     result = new org.apache.lucene.analysis.NGramStemFilter(result,2); 
>> //just a bigram tokenizer
>>
>> We use our own queryparser. The bigramms are searched with a tolerant phrase 
>> query, scoring in a doc the greatest bigramms clusters covering the phrase 
>> token. 
>>
>> Best Regards
>>
>> Uwe
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Ivan Vasilev [mailto:[EMAIL PROTECTED] 
>> Gesendet: Freitag, 21. März 2008 16:25
>> An: java-user@lucene.apache.org
>> Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1
>>
>> Hi Uwe,
>>
>> Could you tell what Analyzer do you use when you marked so big indexing 
>> speedup?
>> If you use StandardAnalyzer (that uses StandardTokenizer) may be the 
>> reason is in it. You can see the pre last report in the thread "Indexing 
>> Speed: 2.3 vs 2.2 (real world numbers)". According to the reporter Jake 
>> Mannix this is because now StandardTokenizer uses StandardTokenizerImpl 
>> that now is generated by JFlex instead of JavaCC.
>> I am asking because I noticed a great speedup in adding documents to 
>> index in our system. We have time control on this in the debug mode. NOW 
>> THEY ARE ADDED 5 TIMES FASTER!!!
>> But in the same time the total process of indexing in our case has 
>> improvement of about 8%. As our system is very big and complex I am 
>> wondering if really the whole process of indexing is reduces so 
>> remarkably and our system causes this slowdown or may be Lucene does 
>> some optimizations on the index, merges or something else and this is 
>> the reason the total process of indexing to be not so reasonably faster.
>>
>> Best Regards,
>> Ivan
>>
>>
>>
>> Uwe Goetzke wrote:
>>> This week I switched the lucene library version on one customer system.
>>> The indexing speed went down from 46m32s to 16m20s for the complete task
>>> including optimisation. Great Job!
>>> We index product catalogs from several suppliers, in this case around
>>> 56.000 product groups and 360.000 products including descriptions were
>>> indexed.
>>>
>>> Regards
>>>
>>> Uwe
>>>
>>>
>>>
>>> -----------------------------------------------------------------------
>>> Healy Hudson GmbH - D-55252 Mainz Kastel
>>> Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076
>>>
>>> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger 
>>> sind, durfen Sie die Informationen nicht offen legen oder benutzen. Wenn 
>>> Sie diese Email durch einen Fehler bekommen haben, teilen Sie uns dies 
>>> bitte umgehend mit, indem Sie diese Email an den Absender zuruckschicken. 
>>> Bitte loschen Sie danach diese Email.
>>> This email is confidential. If you are not the intended recipient, you must 
>>> not disclose or use this information contained in it. If you have received 
>>> this email in error please tell us immediately by return email and delete 
>>> the document.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>> __________ NOD32 2913 (20080301) Information __________
>>>
>>> This message was checked by NOD32 antivirus system.
>>> http://www.eset.com
>>>
>>>
>>>
>>>   
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>> -----------------------------------------------------------------------
>> Healy Hudson GmbH - D-55252 Mainz Kastel
>> Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076
>>
>> Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfänger 
>> sind, dürfen Sie die Informationen nicht offen legen oder benutzen. Wenn Sie 
>> diese Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte 
>> umgehend mit, indem Sie diese Email an den Absender zurückschicken. Bitte 
>> löschen Sie danach diese Email.
>> This email is confidential. If you are not the intended recipient, you must 
>> not disclose or use this information contained in it. If you have received 
>> this email in error please tell us immediately by return email and delete 
>> the document.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to