>  no_escape po c | c `elem` poNoEscX po = True
>  no_escape _ '\t' = True  -- tabs will likely be converted to spaces
>  no_escape _ '\n' = True
> -no_escape po c = if (poIsprint po) then isPrint c
> +no_escape po c = if (poIsprint po) then c_isprint c
>                                     else isPrintableAscii c
>                   ||  c >= '\x80' && po8bit po

I think what you're seeing is that this doesn't work in multi-bytes
locales, such as UTF-8.

In a multi-byte locale, a byte is not printable or not per se; it's
characters that are printable or not, and characters can be
represented by multiple bytes.  For example, in UTF-8 the byte
sequence

  C3 AA

represents a French ``e-circumflex'', but it doesn't make sense to ask
wheter C3 nor AA are printable.

An approach that should work well in self-synchronising encodings (for
example UTF-8) would be to find the longest printable prefix of the
given string (possibly 0 bytes), then quote one byte, and try again.
Your mileage will vary with non-self-synchronising encodings -- it
should work fairly well with EUC, but will most probably break-down
completely with ISO 2022-JP.

You can find the longest printable prefix of a string by using
mbstowcs(3) with a DEST argument of NULL.  This function is in ISO C
since 1995, and so should be fairly portable nowadays.

                                        Juliusz

_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel

Reply via email to