> no_escape po c | c `elem` poNoEscX po = True
> no_escape _ '\t' = True -- tabs will likely be converted to spaces
> no_escape _ '\n' = True
> -no_escape po c = if (poIsprint po) then isPrint c
> +no_escape po c = if (poIsprint po) then c_isprint c
> else isPrintableAscii c
> || c >= '\x80' && po8bit po
I think what you're seeing is that this doesn't work in multi-bytes
locales, such as UTF-8.
In a multi-byte locale, a byte is not printable or not per se; it's
characters that are printable or not, and characters can be
represented by multiple bytes. For example, in UTF-8 the byte
sequence
C3 AA
represents a French ``e-circumflex'', but it doesn't make sense to ask
wheter C3 nor AA are printable.
An approach that should work well in self-synchronising encodings (for
example UTF-8) would be to find the longest printable prefix of the
given string (possibly 0 bytes), then quote one byte, and try again.
Your mileage will vary with non-self-synchronising encodings -- it
should work fairly well with EUC, but will most probably break-down
completely with ISO 2022-JP.
You can find the longest printable prefix of a string by using
mbstowcs(3) with a DEST argument of NULL. This function is in ISO C
since 1995, and so should be fairly portable nowadays.
Juliusz
_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel