Pádraig Brady wrote:
> \u3000 is ideographic space, i.e. a space generally used in east asian text
> so that alignment is maintained. Since it's a space, and not non breaking 
> space
> it should be treated as a blank character IMHO.

It should be treated like a space character. Implementations essentially
agree what this means. See gnulib/tests/test-c32isspace.c.

The "blank" character category has, unfortunately, so much variation among
implementations that it is not really useful. See 
gnulib/tests/test-c32isblank.c:
      case '3':
        /* Locale encoding is UTF-8.  */
        {
        #if defined __GLIBC__
          /* U+00A0 NO-BREAK SPACE */
          is = for_character ("\302\240", 2);
          ASSERT (is == 0);
        #endif
          /* U+00B7 MIDDLE DOT */
          is = for_character ("\302\267", 2);
          ASSERT (is == 0);
        #if defined __GLIBC__
          /* U+202F NARROW NO-BREAK SPACE */
          is = for_character ("\342\200\257", 3);
          ASSERT (is == 0);
        #endif
          /* U+3002 IDEOGRAPHIC FULL STOP */
          is = for_character ("\343\200\202", 3);
          ASSERT (is == 0);
          /* U+1D13D MUSICAL SYMBOL QUARTER REST */
          is = for_character ("\360\235\204\275", 4);
          ASSERT (is == 0);
          /* U+E0020 TAG SPACE */
          is = for_character ("\363\240\200\240", 4);
          ASSERT (is == 0);
        }
I could not find any non-ASCII character for which iswblank is true
across platforms.

Bruno




Reply via email to