On Tue, 2005-12-20 at 23:46 +0100, Ole Laursen wrote: > Stephan Puchegger <[EMAIL PROTECTED]> writes: > > >> It would be nice to figure out in my program for /each/ file I read, > >> in which character set it is encoded. Is this possible? I only found > >> functions so far which can either read the locale's character set or > >> check if some filename is valid UTF-8 (or not), but no function > >> which individually probes for a certain file in which character set > >> its filename is encoded. > > > > I am no expert in "character-string encodings", but I guess that this is > > not possible, since no string contains information about the actual > > encoding type. The only thing it contains is the encoded string itself. > > The encoding type is usually taken from the locale if I am not > > completely mistaken. > > This is a old thread but: most browsers have an auto-detect option for > guessing the encoding of web pages because web authors are sloppy and > forget to specify what encoding they are using. > > With file names, you would only have very little data to base the > guess on, but on the other hand you probably only have to worry about > two encodings, UTF-8 and the encoding of the locale. So it should be > doable. In fact, I think trying UTF-8 and then falling back to the > encoding of the locale if the file name is not valid UTF-8 will get > you through most cases intact, at least for European languages.
I _think_ that regexxer (a gtkmm app) guesses encodings and has a fallback. gedit probably does too, because plain text files don't have encoding information. -- Murray Cumming [EMAIL PROTECTED] www.murrayc.com www.openismus.com _______________________________________________ gtkmm-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/gtkmm-list
