On 02.08.2017 11:25, Ben RUBSON wrote:
We would then be able to correctly log 'André'
Actually, this is how I most often get it, on the web and in scam emails :
"Hi André,"
To the savvy and experienced multilingual-application-programming expert, this of course
is entirely transparent :
- the letters "é" are in reality the (bad) interpretation as ISO-8859-1, of the UTF-8
2-bytes sequence \xc3\xa9, which as you well know now, represents the Unicode character
with codepoint 233 (decimal) (or E9 (hexadecimal)), which is the printable latin letter "é".
- the misinterpretation is due to some program in the chain leading to this HTML page or
email, which does not, or incorrectly, support UTF-8, and which has interpreted this as 2
bytes, instead of 1 character.
(It gets even funnier when some other program in the chain tries to do "the right thing"
and re-encodes these 2 characters as UTF-8, thus yelding 4 bytes which have nothing to do
anymore with the original, no matter how encoded. And I am sure than anyone dealing with
the Chinese language has better stories to tell.)
Knowing how it happens does nothing to alleviate the frustration though.
It even happens with organisations which by all means /should/ really know
better.
The following is extracted from emails received from the *Apache httpd* Developpers
mailing list :
[...]
Weird behaviour with mod_ssl and SSLCryptoDevice
85066 by: jean-frederic clere
85067 by: Stefan Eissing
85068 by: jean-frederic clere
85069 by: jean-frederic clere
85075 by: Jan Kaluža <------------
[...]
scoreboard and http2
85003 by: Stefan Eissing
85004 by: Graham Leggett
85005 by: Plüm, Rüdiger, Vodafone Group <---
Poor Rüdiger, who systematically sees his name mangled each time he contributes.
And who knows what Jan's name really looks like ?
Mind you, this is about Apache httpd, the webserver which powers more than 50% of
worldwide websites. So I believe that other character-set confused programmers have some
excuse.
:-)