How does Lucene handle phrases (literals) containing words that are not indexed? (e.g. stopwords, one-letter words, numbers)? I did some tests (lucene demo, my own 120000 xml documents, Cocoon search) and in all cases it looks like that when you are looking for the phrase "a specification" it also finds documents which contain "the specification". (or: "D. Washington" instead of "G. Washington").
Of course you can change the index behaviour and make sure there are no stopwords, and all one-letter words and numbers are indexed. But that seems a bad approach. A better approach: 1) find all indexed words in the phrase and from these words find all documents containing these words. 2) check the occurence of the phrase by opening the original document. I am wondering: does Lucene performs step 2)? Off course this step burns some cpu cycles. Hugo [EMAIL PROTECTED] -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
