On 13 Dec 2013, at 08:03 , 水静流深 <1248283...@qq.com> wrote:

> in http://www.ascii-code.com/, you can see the the hex value of Πis 8C,
> 

(Looks like Brian got his version mangled in transmission.)

Anything above 7F is not ASCII.

Various "8-bit extensions" put various non-ASCII characters at various places 
in the range 80-FF. Your reference shows the Latin-1 encoding which covers the 
Western European languages. That was useful for a while [*], until the West and 
the East began talking to eachother and found that the other party's documents 
were putting different characters in the same places of different encodings.

UTF-8 uses multibyte sequences like c5 92 to represent extra characters, which 
allows you to have more than 128 of them.

http://www.utf8-chartable.de/unicode-utf8-table.pl?start=256
http://www.joelonsoftware.com/articles/Unicode.html

-pd

[*] A short while, actually, because it was preceded by another encoding mess 
known as IBM Code Pages. Famously, in this country, IBM computers (and many 3rd 
party printers!) shipped with a code page missing the O-slash Danish character 
which got printed as "cent"/"Yen"!


> 
> 
> 
> 
> 
> why in my R console ?
> charToRaw("Œ")
> [1] c5 92
> is not 8C ?
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to