On Fri, 2008-08-29 at 01:16 -0400, der Mouse wrote:
> > 2) Scan all the strings in the current document for non-latin-1 (e.g. 
> > UTF-8) characters
> 
> I must have misunderstood something here.  A string of octets may
> simultaneously be valid Latin-1 text and valid UTF-8 text (for example,
> 0xde 0xa3 is UTF-8 for Greek capital sigma, U+03A3, but is also Latin-1
> for the two-character sequence capital-thorn pound-sign).
> 
> Or does the "current document" being scanned store text in some way
> which does not have this ambiguity?

I presume that Mike means we scan for bytes with value >128 in the UTF-8
string. The 7bit ASCII codes should be common to Latin-1 and Unicode.

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)



_______________________________________________
geda-dev mailing list
[email protected]
http://www.seul.org/cgi-bin/mailman/listinfo/geda-dev

Reply via email to