One would think that all “space characters” are by definition “whitespace”. Not true!: http://www.fileformat.info/info/unicode/char/00a0/index.htm
So I’m working on an app where I can no longer use WhitespaceTokenizer since I need to check for isSpacheChar *OR* isWhitespace. Alternatively I could use MappingCharFilter, I realize. This had trickle-down effects on a search platform I’m working on that was triggered by a user’s search. It’s caused all sorts of head-scratching till we discovered what’s going on. Craziness. ~ David -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com