I think it's predictable, isSpace (which words is based on) is based on generalCategory, which returns the proper Unicode category:λ> generalCategory '\xa0'SpaceI agree, and I also agree that it would make sense the other way (not breaking on non-breaking spaces). Perhaps it would be a good idea to add a remark to the documentation which specifies the treatment of non-breaking spaces.
I note that Java has two distinct properties concerning whitespace:
Character.isSpaceChar('\xA0') == True
Character.isWhitespace('\xA0') == False
Contrast with
-- \x20 is ASCII space
Character.isSpaceChar('\x20') == True
Character.isWhitespace('\x20') == True
-- \x2060 is the word-joiner (zero-width non-breaking space)
Character.isSpaceChar('\x2060') == False
Character.isWhitespace('\x2060') == False
-- \x202F is the narrow non-breaking space
Character.isSpaceChar('\x202F') == True
Character.isWhitespace('\x202F') == False
-- \x2009 is the thin space
Character.isSpaceChar('\x2009') == True
Character.isWhitespace('\x2009') == True
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe