SCSU doesn't look very nice for me. The idea is OK but it's just
too complicated. Various proposals of encodings differences or xors
between consecutive characters are IMHO technically better: much
simpler to implement and work as well.
These differential schemes seem to be the way
[sorry if you receive this twice -- wee little problem with my mailer]
D Recently I created a test file of all Unicode characters in code
D point order (excluding the surrogates, but including all the other
D non-characters). I will admit up front that this is a pathological
D test case and
SCSU is also registered as an IANA charset, although you are
unlikely to find
raw SCSU text on the Internet, due to its use of control
characters (bytes
below 0x20).
And what browser supports SCSU, and what it that browser's reach in term of
population? Because that's usually what
In a message dated 2001-07-12 8:27:20 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
The Ethiopian News Headlines has relocated to a new server at
http://www.ethiozena.net/ and is making it easier than ever to
read news headlines in Unicode. A companion Unicode only server
is
In a message dated 2001-07-12 22:55:09 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
SCSU is also registered as an IANA charset, although you are
unlikely to find
raw SCSU text on the Internet, due to its use of control
characters (bytes below 0x20).
And what browser supports SCSU,
In message [EMAIL PROTECTED]
[EMAIL PROTECTED] wrote:
Encoded in UTF-8, the file was 1891 bytes long. Converted into SCSU, it
dropped to 1121 bytes, which is 40% shorter than the UTF-8 version, better
than UTF-16, and probably better than any existing legacy encoding for
On Fri, Jul 13, 2001 at 02:14:25AM +0100, David Starner wrote:
As someone involved in the service I often wish there was some
form of compressed Unicode encoding. The 3-byte penalty that
Ethiopic bears under UTF-8 turns into higher bandwidth that web
hosting services meter and charge for
Fri, 13 Jul 2001 03:01:10 EDT, [EMAIL PROTECTED] [EMAIL PROTECTED] pisze:
Unfortunately, you don't hear much about SCSU, and in particular
the Unicode Consortium doesn't really seem to promote it much
(although they may be trying to avoid the "too many UTF's" syndrome).
SCSU doesn't look
[EMAIL PROTECTED] wrote:
As a test, I downloaded the first article on the page:
http://unicode.ethiozena.net/Gazettas/Kibrit/Archives/1993/Hamle/05/Kibrit.051
193.sera.html
The article, dated 1993-05-11, has the formidable title:
Yesterday in the Ethiopian calendar :) insert
In a message dated 2001-07-13 4:07:35 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
SCSU doesn't look very nice for me. The idea is OK but it's just
too complicated. Various proposals of encodings differences or xors
between consecutive characters are IMHO technically better: much
In a message dated 2001-07-13 7:00:26 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
Sounds promising! How well does SCSU gzip?
If gzip works anything like PKZIP, the answer is, very well indeed. This is
because (using the explanation I have heard before) SCSU retargets Unicode
text to
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Raw UTF-8 4,382,592
Zipped UTF-82,264,152 (52% of raw UTF-8)
Raw SCSU1,179,688 (27% of raw UTF-8)
Zipped SCSU 104,316 (9% of raw SCSU, 5% of zipped UTF-8)
The data set is truly
Unfortunately, you don't hear much about SCSU, and in particular the Unicode
Consortium doesn't really seem to promote it much (although they may be
trying to avoid the too many UTF's syndrome).
Probably that's one point. But also, SCSU is something that's a little more
complicated to
From: [EMAIL PROTECTED]
None as far as I know, which sort of destroys the whole plan. It would
sure
be nice if MSIE and Navigator started quietly supporting SCSU, in the
same
way that they quietly (to the average user) began supporting UTF-8.
If you want the code in Navigator, write it up
From: Keld Jørn Simonsen [EMAIL PROTECTED]
UTF-16 is not just 2 bytes, it is sometimes 2 and sometimes 4 bytes.
IETF is recommending UTF-8 as the prime charset in all Internet protocols.
Blah. For his purposes, UTF-16 is 2 bytes. The odds his newspaper will have
significant quantities of
In a message dated 2001-07-12 8:27:20 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
As someone involved in the service I often wish there was some
form of compressed Unicode encoding. The 3-byte penalty that
Ethiopic bears under UTF-8 turns into higher bandwidth that web
hosting
Greeings,
I thought this would be of interest to people here who might be
involved in multilingual news services:
The Ethiopian News Headlines has relocated to a new server at
http://www.ethiozena.net/ and is making it easier
I should have also mentioned that SCSU is fully supported by the programming
toolkit ICU (International Components for Unicode), found at:
http://oss.software.ibm.com/icu/
An Open Source project, ICU is available for free and comes with voluminous
documentation.
SCSU is also registered
As someone involved in the service I often wish there was some
form of compressed Unicode encoding. The 3-byte penalty that
Ethiopic bears under UTF-8 turns into higher bandwidth that web
hosting services meter and charge for by the megabyte. For a
popular site this soon makes UTF-8 a
19 matches
Mail list logo