Re: Surrogates and noncharacters

2015-05-12 Thread Philippe Verdy
Even if UTF-8 initially started as part of some Unix standardization process, it was for the prupose of allowing interchanges across systems. The networking concept was already there (otherwise it would not have been part of the emerging *nix standardization processes, and would have remained a

Re: Surrogates and noncharacters

2015-05-12 Thread Steffen Nurpmeso
Hans Aberg haber...@telia.com wrote: | On 12 May 2015, at 16:50, Philippe Verdy verd...@wanadoo.fr wrote: | Indeed, that is why UTF-8 was invented for use in Unix-like environments. | | Not the main reason: communication protocols, and data storage \ | is also based on 8-bit code units (even

FYI: The world’s languages, in 7 maps and charts

2015-05-12 Thread Mark Davis ☕️
http://www.washingtonpost.com/blogs/worldviews/wp/2015/04/23/the-worlds-languages-in-7-maps-and-charts/

Re: Surrogates and noncharacters

2015-05-12 Thread Hans Aberg
On 12 May 2015, at 16:50, Philippe Verdy verd...@wanadoo.fr wrote: Indeed, that is why UTF-8 was invented for use in Unix-like environments. Not the main reason: communication protocols, and data storage is also based on 8-bit code units (even if storage group them by much larger

Re: FYI: The world’s languages, in 7 maps and charts

2015-05-12 Thread Karl Williamson
On 05/12/2015 03:05 PM, Mark Davis ☕️ wrote: http://www.washingtonpost.com/blogs/worldviews/wp/2015/04/23/the-worlds-languages-in-7-maps-and-charts/ // And a critique: http://languagelog.ldc.upenn.edu/nll/?p=18844

Re: FYI: The world’s languages, in 7 maps and charts

2015-05-12 Thread dzo
And a tangent, picking up on a complaint that Swahili wasn't represented on one of the 7 WaPost graphics: http://niamey.blogspot.com/2015/05/how-many-people-speak-what-in-africa.html Two other recent posts on this blog (Beyond Niamey) critique the Africa part of a set of graphics/maps of

Re: Surrogates and noncharacters

2015-05-12 Thread Philippe Verdy
2015-05-11 23:53 GMT+02:00 Hans Aberg haber...@telia.com: It is perfectly fine considering the Unicode code points as abstract integers, with UTF-32 and UTF-8 encodings that translate them into byte sequences in a computer. The code points that conflict with UTF-16 might have been merely

Re: Surrogates and noncharacters

2015-05-12 Thread Hans Aberg
On 12 May 2015, at 15:45, Philippe Verdy verd...@wanadoo.fr wrote: 2015-05-11 23:53 GMT+02:00 Hans Aberg haber...@telia.com: It is perfectly fine considering the Unicode code points as abstract integers, with UTF-32 and UTF-8 encodings that translate them into byte sequences in a

Re: Surrogates and noncharacters

2015-05-12 Thread Philippe Verdy
2015-05-12 15:56 GMT+02:00 Hans Aberg haber...@telia.com: Indeed, that is why UTF-8 was invented for use in Unix-like environments. Not the main reason: communication protocols, and data storage is also based on 8-bit code units (even if storage group them by much larger blocks). UTF-8 is