Re: WordBoundTokenFilter

Denis Bazhenov Mon, 13 Jun 2011 04:31:03 -0700

Okay, now I'm experiencing one of those "Simpsons already did it" moments in my 
life :) Nevertheless, nice to know that this problem already solved and I 
should write no code at all. Thanks a lot!


On 13.06.2011, at 22:11, Uwe Schindler wrote:

> In Lucene trunk (will be version 4.0), all analyzers/tokenizers/tokenfilters
> were moved to a new shared analyzer module. So WDF is now part of a shared
> Lucene/Solr module. In 3.x, you still have to add the Solr JARS to use it.
> 
> This TokenFilter should do what you intend to do (see the Solr
> documentation, where all parameters are explained):
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimit
> erFilterFactory
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
> 
> 
>> -----Original Message-----
>> From: Em [mailto:[email protected]]
>> Sent: Monday, June 13, 2011 1:02 PM
>> To: [email protected]
>> Subject: Re: WordBoundTokenFilter
>> 
>> Yes, it's part of Solr. And even in Solr there was no documentation in the
> API
>> - at last when I searched for it the last time.
>> 
>> Regards,
>> Em
>> 
>> Am 13.06.2011 12:56, schrieb Denis Bazhenov:
>>> It seems so. Interestingly I can't find any mentions of
>> WordDelimiterTokenFilter using google. Is it part of Solr codebase?
>>> On 13.06.2011, at 21:49, Em wrote:
>>> 
>>>> Hi,
>>>> 
>>>> sounds like the WordDelimiterTokenFilter from Solr, doesn't it?
>>>> 
>>>> Regards,
>>>> Em
>>>> 
>>>> Am 13.06.2011 12:06, schrieb Denis Bazhenov:
>>>>> Some time ago I need to tune our home grown search engine based on
>> lucene to perform well on product searches. Product search is search where
>> users come with part of product name and we should find the product.
>>>>> 
>>>>> The problem here is that users doesn't provide full model name. For
>> instance id product model name is "Sony PRS-A9000QF", users frequently
>> search for "PRS 9000", "9000QF" etc.
>>>>> 
>>>>> The simple and straightforward solution to this problem is to tokenize
>> model names on the different character type boundary. So for "Sony PRS-
>> A9000QF" we will have 5 terms: "sony", "prs", "a", "9000" "qf". This
> solution
>> could dramatically increase search sensitive (which is not a good thing in
> a
>> general search), but works well in a specialized indexes.
>>>>> 
>>>>> So a developed such a token filter. My question is there any interest
> in
>> this solution for the community, and does it make sense to contribute it
>> back?
>>>>> ---
>>>>> Denis Bazhenov <[email protected]>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --------------------------------------------------------------------
>>>>> - To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>> 
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>> 
>>> 
>>> ---
>>> Denis Bazhenov <[email protected]>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 

---
Denis Bazhenov <[email protected]>






---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: WordBoundTokenFilter

Reply via email to