One would think that all “space characters” are by definition
“whitespace”.  Not true!:
http://www.fileformat.info/info/unicode/char/00a0/index.htm

So I’m working on an app where I can no longer use WhitespaceTokenizer
since I need to check for isSpacheChar *OR* isWhitespace.  Alternatively I
could use MappingCharFilter, I realize.

This had trickle-down effects on a search platform I’m working on that was
triggered by a user’s search.  It’s caused all sorts of head-scratching
till we discovered what’s going on.

Craziness.

~ David
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Reply via email to