On Mon, Feb 11, 2013 at 3:05 PM, Matthieu Moy
> Erik Faye-Lund <kusmab...@gmail.com> writes:
>> But isn't UTF-8 constructed to be very unlikely to clash with existing
>> encodings? If so, I could add a case for non-ascii and non-UTF-8, that
>> simply writes the byte as a hex-tuple?
> If it's non-ascii and non-UTF-8, I think you'd want to display the byte
> as it is, because this is how it was entered. IOW, I'd say we should
> keep the current behavior in this case.
Yes, you are of course right. We should detect UTF-8, and only in
those cases do anything special. Because the likely alternatives are
other 8-byte encodings (which the terminal already should grok, since
the user was able to input it), or other multi-byte sequences (which
already is broken, and is tricky to handle). So at least we'd only
break in very unlikely cases.
But, I wonder, could mbrlen be used to detect the length instead? It
consults LC_CTYPE to find out what encoding to use, which seems like
it might give the correct answer in all non-corrupted cases... I'm far
from an expert on UNIX-internationalization, though. And this approach
is likely to break on Windows, but I suspect that we can perform some
well-placed hack for it, as we already know that we're doing UTF-8
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html