StandardAnalyzer should not be indexing punctuation from my
experience...instead something like old:fart would be indexed as old and
fart. QueryParser will then generate a query of old within 1 of fart for
the query old:fart. This is the case for all punctuation I have run
into. Things like f.b.i are handled differently though. Its indexed as
fbi...ie the dots are removed...thats part of the acronym handling.
There are a couple other special handlers as well...but in general
punctuation is ignored...except that QueryParser will look for the words
broken by the punctuation next to each other.
-Mark
Felix Litman wrote:
Yes, thank you. That would be a good solution. But we are using Lucene's Standard
Analyzer. It seems to index words with colons ":" and other punctuation by
default. Is there a simple way to have the Analyzer not to index colons specifically and
punctuation in general?
Erick Erickson <[EMAIL PROTECTED]> wrote: I've got to ask why you'd want to
search on colons. Why not just index the
words without colons and search without them too? Let's say you index the
word "work:" Do you really want to have a search on "work" fail?
By and large, you're better off indexing and searching without
punctuation....
Best
Erick
On 1/28/07, Felix Litman wrote:
Is there a simple way to turn off field-search syntax in the Lucene
parser, and have Lucene recognize words ending in a colon ":" as search
terms instead?
Such words are very common occurrences for our documents (or any plain
text), but Lucene does not seem to find them. :-(
Thank you,
Felix
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]