[quoted lines by Lee Maschmeyer on 2012/07/03 at 14:07 -0400] >It turns out the problem is in the lynx dump of the xml file. Although that >file contains protégé, when lynx finishes dumping it it becomes: > >protégé > >which seems to be 0xc383c2a9
Okay, two things are at work, here. First: The file is encoded in UTF-8 whereas your lynx isn't configured to detect this. Your lynx is believing that the file is encoded in Latin1 (ISO 8859-1). Second: Your lynx is configured to use ASCII equivalents for non-ASCII characters. Usually this means displaying a letter without its accent. So, it goes like this: The original character, é (lowercase e with acute), is E9. This character, encoded in UTF-8, appears in your file as C3A9. Lynx, assuming the file is encoded in Latin1, sees this as the two separate characters C3 and A9. It's displaying C3, which is an uppercase A with a tilde, as a plain uppercae A, and it's displaying A9, which is the copyright symbol, as itself - which is why you're seeing A©. Now for the next mystery - why you're then seeing C383C2A9: C383 is UTF-8 for C3, and C2A9 is UTF-8 for A9. C3A9 is UTF-8 for the original character, é, which is E9. In other words, something you did assumed that the two characters in your file, which represent the UTF-8 encoding for the single character é, were two separate characters, and then encoded each of those two separate characters in UTF-8. -- Dave Mielke | 2213 Fox Crescent | The Bible is the very Word of God. Phone: 1-613-726-0014 | Ottawa, Ontario | 2011 May 21 is the End of Salvation. EMail: [email protected] | Canada K2A 1H7 | http://Mielke.cc/now.html http://FamilyRadio.com/ | http://Mielke.cc/bible/ _______________________________________________ This message was sent via the BRLTTY mailing list. To post a message, send an e-mail to: [email protected] For general information, go to: http://mielke.cc/mailman/listinfo/brltty
