[EMAIL PROTECTED] said: > >>Iconv is just clumsy. You can't even make (sane) wrappers to do this >>stuff. It's as if it were designed by people just converting big chunks >> of >>raw text. Maybe it's just me but I'm not seeing that in real world apps. >> >> > > On the other hand, the iconv API is more flexible the way it is. It > can handle strings with embedded zeroes,
Now *that* is rare. For that use iconv. > plus it can deal with strings > that are not null terminated. If it assumed the input was zero terminated > it could not do either of those things. Just because the conversion routine stops at a null terminator in the source doesn't mean it cannot operate on a string that is not null terminated. The encdec interface I described can convert non-null terminated strings by limiting the number of bytes inspected in src using the sn parameter. > The problem is at whichever point in your application where you lose the > length data on your strings, there is the problem. You're right! I'll get on the horn and open a trouble ticket with Microsoft support right away :) > Its simply not safe to > store a null-terminated buffer in an unknown encoding- it doesnt even make > sense to do so. Either convert to a known encoding earlier, or else keep > the length data along with the strings's data pointer. I never said the encoding was unknown. I said it was predefined or negotiated. There are many applications that maintain the same data structures but permit strings to be encoded in differently. Consider HTML pages and MIME messages with bogus length parameters. The W3C claims all apps should use UTF-16 internally so if you want to use those in your application you need to convert them. Whenever I develop a string handling function [1] I try to do it so that it can use the locale encoding (e.g. ISO-8859-1, UTF-8, etc) or wchar_t but that doesn't change how those strings are handled. The CIFS networking protocol will negotiate the character encoding as either UCS-2LE (which is sometimes really UTF16-LE) or the locale 8 bit codepage. Mike [1] i.e. http://www.ioplex.com/~miallen/libmba/dl/src/csv.c -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
