RE: More about SCSU (was: Re: A UTF-8 based News Service)

2001-07-21 Thread Yves Arrouye
SCSU doesn't look very nice for me. The idea is OK but it's just too complicated. Various proposals of encodings differences or xors between consecutive characters are IMHO technically better: much simpler to implement and work as well. These differential schemes seem to be the way

Compressing Unicode [was: A UTF-8 based News Service]

2001-07-14 Thread Juliusz Chroboczek
[sorry if you receive this twice -- wee little problem with my mailer] D Recently I created a test file of all Unicode characters in code D point order (excluding the surrogates, but including all the other D non-characters). I will admit up front that this is a pathological D test case and

RE: More about SCSU (was: Re: A UTF-8 based News Service)

2001-07-13 Thread Yves Arrouye
SCSU is also registered as an IANA charset, although you are unlikely to find raw SCSU text on the Internet, due to its use of control characters (bytes below 0x20). And what browser supports SCSU, and what it that browser's reach in term of population? Because that's usually what

Re: A UTF-8 based News Service

2001-07-13 Thread DougEwell2
In a message dated 2001-07-12 8:27:20 Pacific Daylight Time, [EMAIL PROTECTED] writes: The Ethiopian News Headlines has relocated to a new server at http://www.ethiozena.net/ and is making it easier than ever to read news headlines in Unicode. A companion Unicode only server is

Re: More about SCSU (was: Re: A UTF-8 based News Service)

2001-07-13 Thread DougEwell2
In a message dated 2001-07-12 22:55:09 Pacific Daylight Time, [EMAIL PROTECTED] writes: SCSU is also registered as an IANA charset, although you are unlikely to find raw SCSU text on the Internet, due to its use of control characters (bytes below 0x20). And what browser supports SCSU,

Re: A UTF-8 based News Service

2001-07-13 Thread Kevin Bracey
In message [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Encoded in UTF-8, the file was 1891 bytes long. Converted into SCSU, it dropped to 1121 bytes, which is 40% shorter than the UTF-8 version, better than UTF-16, and probably better than any existing legacy encoding for

Re: A UTF-8 based News Service

2001-07-13 Thread Keld Jørn Simonsen
On Fri, Jul 13, 2001 at 02:14:25AM +0100, David Starner wrote: As someone involved in the service I often wish there was some form of compressed Unicode encoding. The 3-byte penalty that Ethiopic bears under UTF-8 turns into higher bandwidth that web hosting services meter and charge for

Re: More about SCSU (was: Re: A UTF-8 based News Service)

2001-07-13 Thread Marcin 'Qrczak' Kowalczyk
Fri, 13 Jul 2001 03:01:10 EDT, [EMAIL PROTECTED] [EMAIL PROTECTED] pisze: Unfortunately, you don't hear much about SCSU, and in particular the Unicode Consortium doesn't really seem to promote it much (although they may be trying to avoid the "too many UTF's" syndrome). SCSU doesn't look

Re: A UTF-8 based News Service

2001-07-13 Thread Daniel Yacob
[EMAIL PROTECTED] wrote: As a test, I downloaded the first article on the page: http://unicode.ethiozena.net/Gazettas/Kibrit/Archives/1993/Hamle/05/Kibrit.051 193.sera.html The article, dated 1993-05-11, has the formidable title: Yesterday in the Ethiopian calendar :) insert

Re: More about SCSU (was: Re: A UTF-8 based News Service)

2001-07-13 Thread DougEwell2
In a message dated 2001-07-13 4:07:35 Pacific Daylight Time, [EMAIL PROTECTED] writes: SCSU doesn't look very nice for me. The idea is OK but it's just too complicated. Various proposals of encodings differences or xors between consecutive characters are IMHO technically better: much

Re: A UTF-8 based News Service

2001-07-13 Thread DougEwell2
In a message dated 2001-07-13 7:00:26 Pacific Daylight Time, [EMAIL PROTECTED] writes: Sounds promising! How well does SCSU gzip? If gzip works anything like PKZIP, the answer is, very well indeed. This is because (using the explanation I have heard before) SCSU retargets Unicode text to

RE: A UTF-8 based News Service

2001-07-13 Thread Ayers, Mike
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Raw UTF-8 4,382,592 Zipped UTF-82,264,152 (52% of raw UTF-8) Raw SCSU1,179,688 (27% of raw UTF-8) Zipped SCSU 104,316 (9% of raw SCSU, 5% of zipped UTF-8) The data set is truly

Re: More about SCSU (was: Re: A UTF-8 based News Service)

2001-07-13 Thread Rick McGowan
Unfortunately, you don't hear much about SCSU, and in particular the Unicode Consortium doesn't really seem to promote it much (although they may be trying to avoid the too many UTF's syndrome). Probably that's one point. But also, SCSU is something that's a little more complicated to

Re: More about SCSU (was: Re: A UTF-8 based News Service)

2001-07-13 Thread David Starner
From: [EMAIL PROTECTED] None as far as I know, which sort of destroys the whole plan. It would sure be nice if MSIE and Navigator started quietly supporting SCSU, in the same way that they quietly (to the average user) began supporting UTF-8. If you want the code in Navigator, write it up

Re: A UTF-8 based News Service

2001-07-13 Thread David Starner
From: Keld Jørn Simonsen [EMAIL PROTECTED] UTF-16 is not just 2 bytes, it is sometimes 2 and sometimes 4 bytes. IETF is recommending UTF-8 as the prime charset in all Internet protocols. Blah. For his purposes, UTF-16 is 2 bytes. The odds his newspaper will have significant quantities of

Re: A UTF-8 based News Service

2001-07-12 Thread DougEwell2
In a message dated 2001-07-12 8:27:20 Pacific Daylight Time, [EMAIL PROTECTED] writes: As someone involved in the service I often wish there was some form of compressed Unicode encoding. The 3-byte penalty that Ethiopic bears under UTF-8 turns into higher bandwidth that web hosting

A UTF-8 based News Service

2001-07-12 Thread Daniel Yacob
Greeings, I thought this would be of interest to people here who might be involved in multilingual news services: The Ethiopian News Headlines has relocated to a new server at http://www.ethiozena.net/ and is making it easier

More about SCSU (was: Re: A UTF-8 based News Service)

2001-07-12 Thread DougEwell2
I should have also mentioned that SCSU is fully supported by the programming toolkit ICU (International Components for Unicode), found at: http://oss.software.ibm.com/icu/ An Open Source project, ICU is available for free and comes with voluminous documentation. SCSU is also registered

Re: A UTF-8 based News Service

2001-07-12 Thread David Starner
As someone involved in the service I often wish there was some form of compressed Unicode encoding. The 3-byte penalty that Ethiopic bears under UTF-8 turns into higher bandwidth that web hosting services meter and charge for by the megabyte. For a popular site this soon makes UTF-8 a