It is possible to ensure an octet represents a graphic of some sort, using uselocale(), isprint(), but this does not do any iconv() type conversion that matches up code point names. So, what's on the screen will still be garbage. It will simply be garbage with glyphs you're used to seeing, not some other language's or the box drawing glyphs of CP437. For non-garbage the application still needs to set the locale to use the code page of the data, it cannot be assumed.
Also, on a byte-by-byte basis for a byte oriented read of a text file, UTF-8 and ISO-8859-1 are NOT equivalent; UTF-8 has only 95 graphics, with the rest being control codes or unassigned/illegal versus 191 graphics and 65 control codes. What can be considered equivalent is processing a UCS-2 encoded file with a custom getc() that ignores all nulls. Then if the data is all UCS-2 code points below 256 the string read will look like it was 8859 encoded. However, because UCS-2 permits arbitrary designation of C0 and C1 sets, some control codes may not be the same and you therefore can still have garbage. On Tuesday, January 8, 2019 Joerg Schilling <joerg.schill...@fokus.fraunhofer.de> wrote: Robert Elz <k...@munnari.oz.au> wrote: > Date: Tue, 8 Jan 2019 12:51:16 +0100 > From: Joerg Schilling <joerg.schill...@fokus.fraunhofer.de> > Message-ID: ><5c348eb4.tc7thjo20z6olugw%joerg.schill...@fokus.fraunhofer.de> > > | e.g. because Unicode is "based" on ISO-8859-1 in that the low 256 values >in the > | UNICODE encoding is identical to the encoding used by ISO-8859-1. > > That's not a rational reason for assuming that any data which is not UTF-8 > is 8859-1 (or 10646-1). If a utf-8 decode fails, the only solution that > works reliably is for someone to tell the software which encoding it is > (well, that's true, even if it is utf-8). I am talking about text files and since there is the way to verify whether the value is a printable ISO-8859-1 character. It is simple to write a program that correctly outputs text in either UTF-8 or ISO-8859-1 without being given the actual encoding if you just expect one of these two encodings. Jörg -- EMail:jo...@schily.net (home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'