Jenny - yes, it is possible to tune Lucene's analysis process very
precisely. Look at the Analyzer you're using, and consider
customizing it to your needs. An analyzer has a tokenizer followed by
token filters - there are a lot of reusable components built into
Lucene's API to choose from and configure.
Erik
On Dec 12, 2008, at 2:46 PM, Jenny Brown wrote:
Is it possible to configure Lucene such that it doesn't tokenize on
embedded dashes, and thus doesn't consider the "A" a stop word because
it's not standing alone? I do believe the combination of dash
handling and stop words is why the query is causing problems for my
user.
On Fri, Dec 12, 2008 at 1:32 PM, Daniel Naber
<[email protected]> wrote:
On Freitag, 12. Dezember 2008, Jenny Brown wrote:
I'm trying to search for company ABC Inc. in places where it may be
mentioned as A-B-C Inc. Lucene is doing something with those
dashes,
though, that prevents me from getting accurate results.
"A" (even in "A-B-C" I think) is a stopword with StandardAnalyzer's
default
settings, which might cause problems. Please also check out these
hints
from the FAQ:
http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71
Regards
Daniel
--
http://www.danielnaber.de