Hi Leonardo,

>  (gdb) print wide_char
>  $2 = 128374 L'\x1f576'
>  (gdb) quit
> 
> L'\x1f576' (in wide_char) is probably the `dark sunglasses' (U+1F576)
> unicode character

It is.  One of three non-ASCII characters in that subject.

    $ uip/scan -file 8759.2.email -format '%(decode{subject})' |
    > iconv -t ucs-4le |
    > hexdump -ve '5/4 "  % 8x" /0 "\n"'
         1f576        53        75        6e      2019
            73        20        6f        75        74
            2c        20        73        61        76
            69        6e        67        73        20
            4f        4e      2014        73        68
            6f        70        20        6d        61
            6a        6f        72        20        61
            70        70        6c        69        61
            6e        63        65        20        64
            65        61        6c        73        20
            6e        6f        77         a          
    $

> and directly trying to:
> 
>  wcwidth(L'\x1f576')
> 
> ...returns `-1'.

That would do it.  Could you apply the attached patch and re-run?  I'm
basically interested in how that locale classes it, e.g.  iswprint(3).

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
diff --git sbr/fmt_scan.c sbr/fmt_scan.c
index c75db3ec..58977359 100644
--- sbr/fmt_scan.c
+++ sbr/fmt_scan.c
@@ -247,6 +247,7 @@ cpstripped (charstring_t dest, size_t max, char *str)
     while (*str != '\0' && len > 0 && max > 0) {
 #ifdef MULTIBYTE_SUPPORT
 	char_len = mbtowc(&wide_char, str, len);
+        fprintf(stderr, "mbtowc(%#x) = %d\n", wide_char, char_len);
 
 	/*
 	 * If mbrtowc() failed, then we have a character that isn't valid
@@ -259,6 +260,7 @@ cpstripped (charstring_t dest, size_t max, char *str)
 	if (char_len < 0) {
 	    altstr = "?";
 	    char_len = mbtowc(&wide_char, altstr, 1);
+            fprintf(stderr, "    mbtowc(%#x) = %d\n", wide_char, char_len);
 	}
 
 	if (char_len <= 0) {
@@ -267,6 +269,8 @@ cpstripped (charstring_t dest, size_t max, char *str)
 
 	len -= char_len;
 
+        fprintf(stderr, "cntrl:%d  space:%d  blank:%d  print:%d\n",
+            iswcntrl(wide_char), iswspace(wide_char), iswblank(wide_char), iswprint(wide_char));
 	if (iswcntrl(wide_char) || iswspace(wide_char)) {
 	    str += char_len;
 #else /* MULTIBYTE_SUPPORT */
@@ -288,6 +292,7 @@ cpstripped (charstring_t dest, size_t max, char *str)
 
 #ifdef MULTIBYTE_SUPPORT
 	w = wcwidth(wide_char);
+        fprintf(stderr, "wcwidth(%#x) = %d\n", wide_char, w);
 	assert(w >= 0);
 	if (max >= (size_t) w) {
 	    charstring_push_back_chars (dest, altstr ? altstr : str, char_len, w);
_______________________________________________
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Reply via email to