StandardAnalyzer should not be indexing punctuation from my experience...instead something like old:fart would be indexed as old and fart. QueryParser will then generate a query of old within 1 of fart for the query old:fart. This is the case for all punctuation I have run into. Things like f.b.i are handled differently though. Its indexed as fbi...ie the dots are removed...thats part of the acronym handling. There are a couple other special handlers as well...but in general punctuation is ignored...except that QueryParser will look for the words broken by the punctuation next to each other.

-Mark

Felix Litman wrote:
Yes, thank you. That would be a good solution.  But we are using Lucene's Standard 
Analyzer.  It seems to index words with colons ":" and other punctuation by 
default.  Is there a simple way to have the Analyzer not to index colons specifically and 
punctuation in general?

Erick Erickson <[EMAIL PROTECTED]> wrote: I've got to ask why you'd want to 
search on colons. Why not just index the
words without colons and search without them too? Let's say you index the
word "work:" Do you really want to have a search on "work" fail?

By and large, you're better off indexing and searching without
punctuation....

Best
Erick

On 1/28/07, Felix Litman  wrote:
Is there a simple way to turn off field-search syntax in the Lucene
parser, and have Lucene recognize words ending in a colon ":" as search
terms instead?

Such words are very common occurrences for our documents (or any plain
text), but Lucene does not seem to find them. :-(

Thank you,
Felix





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to