Hi everyone, Hi Platonides, Ok. Finding the "Hex UTF-8 bytes" representation of an "Hex code point" is not intuitive.
In the link "http://www.cl.cam.ac.uk/~mgk25/unicode.html", faq "What is UTF-8?", I found some parts of answer to my question. Let's consider the "Hex code point" 0xC3. What is the sequence of bits used to represent that character as "Hex UTF-8 bytes"? The binary representation of 0xC3 is 1100 0011. The first bit of this byte being 1 (and not 0) we will use the following "pattern" with two bytes to represent that code: 110xxxxx 10xxxxxx and replace the "x" with the proper bits. To do it, we read the binary representation of 0xC3 from right to left: - 8th bit of 0xC3 binary representation: 1 Replace the 16th x in 110xxxxx 10xxxxxx with 1: 110xxxxx 10xxxxx1 - 7th bit of 0xC3 binary representation: 1 Replace the 15th x in 110xxxxx 10xxxxx1 with 1: 110xxxxx 10xxxx11 - 6th bit of 0xC3 binary representation: 0 Replace the 14th x in 110xxxxx 10xxxx11 with 0: 110xxxxx 10xxx011 - 0 110xxxxx 10xx0011 - 0 110xxxxx 10x00011 - 0 110xxxxx 10000011 - 1 110xxxx1 10000011 - 1 110xxx11 10000011 And replace the remaining "x" with zeros: 11000011 10000011 The hexadecimal representation of 11000011 is 0xC3. The hexadecimal representation of 10000011 is 0x83. Hence the "Hex UTF-8 bytes" representation of 0xC3 is 0xC3 0x83. Is that it? Thanks and all the best, -- Lmhelp -- View this message in context: http://old.nabble.com/Web-page-source---%22strange%22-characters-tp27999218p28028984.html Sent from the WikiMedia General mailing list archive at Nabble.com. _______________________________________________ MediaWiki-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
