On 9/5/06, Rich Felker <[EMAIL PROTECTED]> wrote:
On Mon, Sep 04, 2006 at 11:44:26PM -0500, David Starner wrote:
> Once you compress the data with a decent compression scheme, you may
> as well store the data by writing out the full Unicode name (e.g.
> "LATIN CAPITAL LETTER OU"); the final result will be about the same
> size.
With some compression methods this is true, particularly bz2.
> Furthermore, you can fit a decent sized novel on a floppy
> uncompressed and a decent sized library on a DVD uncompressed.
Yet somehow the firefox source code is still 36 megs (bz2), and god
only knows how large OOO is. Imagine now if all the variable and
function names were written in Hindi or Thai... It would be an
interesting test to transliterate the Latin letters to Devanagari and
see how much the compressed tarball size goes up.
The very point of the above test is that it would change the size
minimally. It shouldn't make much if any difference.
In all seriousness, though, unless you're dealing with image, music,
or movie files, text weighs in quite heavy in size.
As opposed to what? The vast majority of content is one of the four,
and what's left--say, Flash files--don't seem particularly small
compared to text.
If you're making a website
without fluff and with lots of information, text size will be the
dominant factor in traffic. It's quite unfortunate that native
language text is 3 to 6(*) times larger in countries where bandwidth
is very expensive.
Welcome to HTTP 1.1. There's no reason not to compress the data while
you're sending it across the network, which will fix the vast majority
of this problem.
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/