Re: Log and special characters

tomcat Wed, 02 Aug 2017 06:46:10 -0700

On 02.08.2017 11:25, Ben RUBSON wrote:

We would then be able to correctly log 'André'


Actually, this is how I most often get it, on the web and in scam emails :

"Hi AndrÃ©,"

To the savvy and experienced multilingual-application-programming expert, this of courseis entirely transparent :- the letters "Ã©" are in reality the (bad) interpretation as ISO-8859-1, of the UTF-82-bytes sequence \xc3\xa9, which as you well know now, represents the Unicode characterwith codepoint 233 (decimal) (or E9 (hexadecimal)), which is the printable latin letter "é".- the misinterpretation is due to some program in the chain leading to this HTML page oremail, which does not, or incorrectly, support UTF-8, and which has interpreted this as 2bytes, instead of 1 character.

(It gets even funnier when some other program in the chain tries to do "the right thing"and re-encodes these 2 characters as UTF-8, thus yelding 4 bytes which have nothing to doanymore with the original, no matter how encoded. And I am sure than anyone dealing withthe Chinese language has better stories to tell.)


Knowing how it happens does nothing to alleviate the frustration though.

It even happens with organisations which by all means /should/ really know 
better.

The following is extracted from emails received from the *Apache httpd* Developpersmailing list :


[...]
Weird behaviour with mod_ssl and SSLCryptoDevice
        85066 by: jean-frederic clere
        85067 by: Stefan Eissing
        85068 by: jean-frederic clere
        85069 by: jean-frederic clere
        85075 by: Jan KaluÅ¾a  <------------


[...]
scoreboard and http2
        85003 by: Stefan Eissing
        85004 by: Graham Leggett
        85005 by: PlÃ¼m, RÃ¼diger, Vodafone Group  <---

Poor Rüdiger, who systematically sees his name mangled each time he contributes.
And who knows what Jan's name really looks like ?

Mind you, this is about Apache httpd, the webserver which powers more than 50% ofworldwide websites. So I believe that other character-set confused programmers have someexcuse.

:-)

Re: Log and special characters

Reply via email to