[Fwd: unicode conversions]

seer26 Fri, 16 Aug 2002 09:37:53 -0700


Markus: i was wondering about libiconv: is there any plan to support
a fall-back character when performing conversions, as opposed to
always stopping conversion when a character with no destination
representation is encountered?


such as this:


iconv_t iconv_open (const char*to, const char *from, wchar_t fallback );
//fallback is a wchar_t which must be representable in "to" or
//else be 0 meaning that no fallback is to be used


//fallback to "?"
iconv_open( "iso-8859-1", "utf-8", '?' );

//no fallback: quit conversion when no destination encoding
representation:
iconv_open( "sjis", "utf-8", (wchar_t)0 );

--- Begin Message ---

thanks for the response bram,

> > I was thinking _VIM_TEXT could be a 1-byte motion-type,
> > then utf-8 text. There is no backwards compatibility to
> > break, because there is no standard atm, afaik.
> 
> Conversion between 'encoding' and utf-8 will not always be possible.
> It's not backwards compatible either, since an older Vim expects no
> conversion (assuming you run two versions of Vim for some reason).

Hrm, I was under the impression that converting from
non-unicode to unicode was always possible. And the reverse is possible
to the extent that the desired characters are representable in the
destination encoding: if they are not then pasting whats left is 
still valid; moreso than just dumping in the text arbitrarily (as
basically binary).

Unfortunately, while experimenting with my system iconv, it appears
to instead stop when there is no destination encoding for a character,
rather than allowing a fallback to a default character. This can still
work: you just get the first portion of the string which is valid in the
destination encoding, and none after that, which is at least as valid
as ramming whatever's there in as binary imo.

Other converters are configurable in this sense: you could set them up
to convert between two encodings and to use a "?" for example when
there is no destination representation.
 
> This would require a new atom to be used for the clipboard, which
> includes the name of the encoding.  The receiver of the clipboard can
> then attempt conversion.  This isn't very difficult.

i was thinking that such machinery would be redundant, in that unicode
wans supposed to be canonical...

Does anyone else agree/disagree?

--- End Message ---

[Fwd: unicode conversions]

Reply via email to