I thought that by using a StandardAnalyzer with a StopWord list that is a merge of the ENGLISH_STOP_WORDS and a handful of additions that I have provided -- additions which include the most common file suffixes [.txt, .xml, .doc, etc.] -- ought to eliminate any occurrence of those terms in the resulting indexes. However, when I dump the index I see that the last element of the file name concatenated with a 'dot' and the suffix is what is being indexed. So, I guess I did succeed in avoiding the waste of indexing the suffix, but I am losing the index of the final element of a file name that includes embedded white space.
Please advise how to force the parser to recognize and ignore the 'dot'. Thank you.