Okay, now I'm experiencing one of those "Simpsons already did it" moments in my life :) Nevertheless, nice to know that this problem already solved and I should write no code at all. Thanks a lot!
On 13.06.2011, at 22:11, Uwe Schindler wrote: > In Lucene trunk (will be version 4.0), all analyzers/tokenizers/tokenfilters > were moved to a new shared analyzer module. So WDF is now part of a shared > Lucene/Solr module. In 3.x, you still have to add the Solr JARS to use it. > > This TokenFilter should do what you intend to do (see the Solr > documentation, where all parameters are explained): > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimit > erFilterFactory > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -----Original Message----- >> From: Em [mailto:mailformailingli...@yahoo.de] >> Sent: Monday, June 13, 2011 1:02 PM >> To: java-user@lucene.apache.org >> Subject: Re: WordBoundTokenFilter >> >> Yes, it's part of Solr. And even in Solr there was no documentation in the > API >> - at last when I searched for it the last time. >> >> Regards, >> Em >> >> Am 13.06.2011 12:56, schrieb Denis Bazhenov: >>> It seems so. Interestingly I can't find any mentions of >> WordDelimiterTokenFilter using google. Is it part of Solr codebase? >>> On 13.06.2011, at 21:49, Em wrote: >>> >>>> Hi, >>>> >>>> sounds like the WordDelimiterTokenFilter from Solr, doesn't it? >>>> >>>> Regards, >>>> Em >>>> >>>> Am 13.06.2011 12:06, schrieb Denis Bazhenov: >>>>> Some time ago I need to tune our home grown search engine based on >> lucene to perform well on product searches. Product search is search where >> users come with part of product name and we should find the product. >>>>> >>>>> The problem here is that users doesn't provide full model name. For >> instance id product model name is "Sony PRS-A9000QF", users frequently >> search for "PRS 9000", "9000QF" etc. >>>>> >>>>> The simple and straightforward solution to this problem is to tokenize >> model names on the different character type boundary. So for "Sony PRS- >> A9000QF" we will have 5 terms: "sony", "prs", "a", "9000" "qf". This > solution >> could dramatically increase search sensitive (which is not a good thing in > a >> general search), but works well in a specialized indexes. >>>>> >>>>> So a developed such a token filter. My question is there any interest > in >> this solution for the community, and does it make sense to contribute it >> back? >>>>> --- >>>>> Denis Bazhenov <dot...@gmail.com> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -------------------------------------------------------------------- >>>>> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>> >>> --- >>> Denis Bazhenov <dot...@gmail.com> >>> >>> >>> >>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --- Denis Bazhenov <dot...@gmail.com> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org