Erik de Castro Lopo wrote: > Erik de Castro Lopo wrote: > > > Anyone care to clue me into the best way to deal with this? > > I've tried something like this: > > static void > convert_string (const char * in, gchar * out, size_t maxlen)
So you just have a char* that's supposed to contain text, but you don't know the encoding? In that case, in a sense all you have is just bytes, and without knowing the encoding you lack a way to turn that into text. And so you don't have a way to produce a UTF-8 representation of the that text, because you don't have the text. You really only have three options: 1) Find out what the encoding is. In another message you say it's from an ID3 tag… a quick glance at the Wikipedia article suggests ID3v1 doesn't specify an encoding. ID3v2 apparently does, but I wouldn't be shocked to find bad data in them anyway. 2) Guess by inspecting the bytes. Algorithms for this can be fairly complicated and will still be wrong in many cases, so probably not worth the effort. 3) Give up. If the bytes aren't valid UTF-8 pretend they are latin-1 (iso-8859-1). It'll probably be wrong, but decoding as latin-1 will always produce something, even if it is mojibake. Or just tell the user that their data is bad and don't try display it. This is a decent introduction to handling non-ASCII text: <http://www.joelonsoftware.com/articles/Unicode.html>. -Andrew. _______________________________________________ coders mailing list coders@slug.org.au http://lists.slug.org.au/listinfo/coders