Stefan Monnier skrev:
>>> Also IIRC a perfectly valid utf-8 buffer may contain eight-bit-* chars, use
>>> to keep track of valid unicode chars that have no corresponding character in
>>> emacs-mule. So the presence of eight-bit-* chars does not imply that the
>>> utf-8 encoded form of the text will contain an invalid utf-8 byte sequence.
>>>
>
>
>> Yes, but such eight-bit-* chars can be detected by checking
>> `untranslated-utf-8' property.
>>
>
> Sure, but the current code doesn't do that.
>
>
>>>> And, if Emacs owns a unibyte string, perhaps the right thing
>>>> is to make it multibyte according to the current
>>>> lang. env. (by string-make-multibyte) at first, then encode
>>>> it by utf-8.
>>>>
>
>
>>> That sounds terribly fragile/buggy.
>>>
>
>
>> Then, what do you think Emacs should do in such a case?
>>
>
> I think we can't know what should be done, so we should strive for
> simplicity and try to avoid losing information. I.e. just return the
> unibyte string as-is.
>
That was the problem the original report was about. Gtk+-applications
print big warnings. And there is no agreed upon selection type that
represents just bytes.
W.r.t the standards, Emacs has two choices, return a valid UTF8-string
or don't return anything at all. I'm beginning to think the second
option is the best.
Jan D.
_______________________________________________
emacs-pretest-bug mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug