On 13 Dec 2013, at 08:03 , 水静流深 <1248283...@qq.com> wrote: > in http://www.ascii-code.com/, you can see the the hex value of Œ is 8C, >
(Looks like Brian got his version mangled in transmission.) Anything above 7F is not ASCII. Various "8-bit extensions" put various non-ASCII characters at various places in the range 80-FF. Your reference shows the Latin-1 encoding which covers the Western European languages. That was useful for a while [*], until the West and the East began talking to eachother and found that the other party's documents were putting different characters in the same places of different encodings. UTF-8 uses multibyte sequences like c5 92 to represent extra characters, which allows you to have more than 128 of them. http://www.utf8-chartable.de/unicode-utf8-table.pl?start=256 http://www.joelonsoftware.com/articles/Unicode.html -pd [*] A short while, actually, because it was preceded by another encoding mess known as IBM Code Pages. Famously, in this country, IBM computers (and many 3rd party printers!) shipped with a code page missing the O-slash Danish character which got printed as "cent"/"Yen"! > > > > > > why in my R console ? > charToRaw("Œ") > [1] c5 92 > is not 8C ? > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.