Re: Searching using dash in words/abbreviations

Erik Hatcher Fri, 12 Dec 2008 11:52:54 -0800

Jenny - yes, it is possible to tune Lucene's analysis process veryprecisely. Look at the Analyzer you're using, and considercustomizing it to your needs. An analyzer has a tokenizer followed bytoken filters - there are a lot of reusable components built intoLucene's API to choose from and configure.


        Erik


On Dec 12, 2008, at 2:46 PM, Jenny Brown wrote:

Is it possible to configure Lucene such that it doesn't tokenize on
embedded dashes, and thus doesn't consider the "A" a stop word because
it's not standing alone?  I do believe the combination of dash
handling and stop words is why the query is causing problems for my
user.


On Fri, Dec 12, 2008 at 1:32 PM, Daniel Naber
<[email protected]> wrote:

On Freitag, 12. Dezember 2008, Jenny Brown wrote:
I'm trying to search for company ABC Inc. in places where it may be
mentioned as A-B-C Inc. Lucene is doing something with thosedashes,
though, that prevents me from getting accurate results.
"A" (even in "A-B-C" I think) is a stopword with StandardAnalyzer'sdefaultsettings, which might cause problems. Please also check out thesehints
from the FAQ:

http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71

Regards
Daniel

--
http://www.danielnaber.de

Re: Searching using dash in words/abbreviations

Reply via email to