I think it's predictable, isSpace (which words is based on) is based on generalCategory, which returns the proper Unicode category:

λ> generalCategory '\xa0'
Space

I agree, and I also agree that it would make sense the other way (not breaking on non-breaking spaces).  Perhaps it would be a good idea to add a remark to the documentation which specifies the treatment of non-breaking spaces.
 

I note that Java has two distinct properties concerning whitespace:

Character.isSpaceChar('\xA0')  == True
Character.isWhitespace('\xA0') == False

Contrast with

 -- \x20 is ASCII space
Character.isSpaceChar('\x20')  == True
Character.isWhitespace('\x20') == True

 -- \x2060 is the word-joiner (zero-width non-breaking space)
Character.isSpaceChar('\x2060')  == False 
Character.isWhitespace('\x2060') == False

 -- \x202F is the narrow non-breaking space
Character.isSpaceChar('\x202F')  == True
Character.isWhitespace('\x202F') == False

  -- \x2009 is the thin space
Character.isSpaceChar('\x2009')  == True
Character.isWhitespace('\x2009') == True


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to