Hi everyone,
Hi Platonides,

Ok.
Finding the "Hex UTF-8 bytes" representation of an "Hex code point"
is not intuitive.

In the link "http://www.cl.cam.ac.uk/~mgk25/unicode.html";,
faq "What is UTF-8?", I found some parts of answer to my question.

Let's consider the "Hex code point" 0xC3.
What is the sequence of bits used to represent that character
as "Hex UTF-8 bytes"?

The binary representation of 0xC3 is 1100 0011.
The first bit of this byte being 1 (and not 0)
we will use the following "pattern" with two bytes to represent that 
code:
110xxxxx 10xxxxxx
and replace the "x" with the proper bits.
To do it, we read the binary representation of 0xC3
from right to left:

- 8th bit of 0xC3 binary representation: 1
Replace the 16th x in 110xxxxx 10xxxxxx with 1:
110xxxxx 10xxxxx1

- 7th bit of 0xC3 binary representation: 1
Replace the 15th x in 110xxxxx 10xxxxx1 with 1:
110xxxxx 10xxxx11

- 6th bit of 0xC3 binary representation: 0
Replace the 14th x in 110xxxxx 10xxxx11 with 0:
110xxxxx 10xxx011

- 0
110xxxxx 10xx0011

- 0
110xxxxx 10x00011

- 0
110xxxxx 10000011

- 1
110xxxx1 10000011

- 1
110xxx11 10000011

And replace the remaining "x" with zeros:
11000011 10000011

The hexadecimal representation of 11000011 is 0xC3.
The hexadecimal representation of 10000011 is 0x83.

Hence the "Hex UTF-8 bytes" representation of 0xC3 is 0xC3 0x83.

Is that it?

Thanks and all the best,
--
Lmhelp
-- 
View this message in context: 
http://old.nabble.com/Web-page-source---%22strange%22-characters-tp27999218p28028984.html
Sent from the WikiMedia General mailing list archive at Nabble.com.


_______________________________________________
MediaWiki-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Reply via email to