Re: [Haskell-cafe] bug in Prelude.words?

malcolm.wallace Mon, 28 Mar 2011 09:55:16 -0700

I think it's predictable, isSpace (which words is based on) is based on generalCategory, which returns the proper Unicode category:

λ> generalCategory '\xa0'

Space

I agree, and I also agree that it would make sense the other way (not breaking on non-breaking spaces). Perhaps it would be a good idea to add a remark to the documentation which specifies the treatment of non-breaking spaces.

I note that Java has two distinct properties concerning whitespace:

Character.isSpaceChar('\xA0') == True

Character.isWhitespace('\xA0') == False

Contrast with

-- \x20 is ASCII space

Character.isSpaceChar('\x20') == True

Character.isWhitespace('\x20') == True

-- \x2060 is the word-joiner (zero-width non-breaking space)

Character.isSpaceChar('\x2060') == False

Character.isWhitespace('\x2060') == False

-- \x202F is the narrow non-breaking space

Character.isSpaceChar('\x202F') == True

Character.isWhitespace('\x202F') == False

-- \x2009 is the thin space

Character.isSpaceChar('\x2009') == True

Character.isWhitespace('\x2009') == True

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] bug in Prelude.words?

Reply via email to