On Fri, 2008-08-29 at 01:16 -0400, der Mouse wrote: > > 2) Scan all the strings in the current document for non-latin-1 (e.g. > > UTF-8) characters > > I must have misunderstood something here. A string of octets may > simultaneously be valid Latin-1 text and valid UTF-8 text (for example, > 0xde 0xa3 is UTF-8 for Greek capital sigma, U+03A3, but is also Latin-1 > for the two-character sequence capital-thorn pound-sign). > > Or does the "current document" being scanned store text in some way > which does not have this ambiguity?
I presume that Mike means we scan for bytes with value >128 in the UTF-8 string. The 7bit ASCII codes should be common to Latin-1 and Unicode. -- Peter Clifton Electrical Engineering Division, Engineering Department, University of Cambridge, 9, JJ Thomson Avenue, Cambridge CB3 0FA Tel: +44 (0)7729 980173 - (No signal in the lab!) _______________________________________________ geda-dev mailing list [email protected] http://www.seul.org/cgi-bin/mailman/listinfo/geda-dev
