OT Porting to older OSes was RE: Latin w/ diacritics (was Re: benefits of unicode)

Carl W. Brown Thu, 19 Apr 2001 12:26:43 -0700
Marco,

I still remember the Univac I which had memory tubes about the size of your
fist (The Univac II used core).  The 1401 however, was a fully
transistorized computer.  It used core memory which ranged in size from 1400
to 16,000 6 bit bytes.  (Unicode on 6 bit machines is another challenge).

You are right about font files being big.  However, there is no Unicode font
so you have the same large font files even without Unicode.  Large font
files is why some printers have there own disk drives.

Part of the reason that Unicode implementations are so large is that we need
translation tables to maintain compatibility with old code pages.  Eliminate
these code pages and we reduce the size of the Unicode implementation.  At
least Windows is going in the right direction.  All future scripts will be
Unicode only.  This way they don't have to carry the other baggage.

People may talk about line breaking, collation, fonts etc. being resource
hogs.  In actuality you need the same resources for code page systems as
well.  With Unicode however you get to reuse some of these resources if you
support multiple scripts.

The limit for systems like Windows were systems like the Arabic/French
systems.  Beyond that you really need to use Unicode or you will have a real
code bloat.  Unicode is the only practical solution for multi-lingual
systems.

Carl



-----Original Message-----
From: Marco Cimarosti [mailto:[EMAIL PROTECTED]]
Sent: Thursday, April 19, 2001 2:36 AM
To: Unicode List
Cc: 'Carl W. Brown'; 'Kenneth Whistler'
Subject: RE: Latin w/ diacritics (was Re: benefits of unicode)


Carl Brown wrote:
> If these folks really want Unicode everywhere I will write
> Unicode for the IBM 1401 if they are willing to foot the
> bill.  Seriously I would never agree to such a ludicrous
> idea.

Thanks, Carl, but if "these folks" is me, I don't even know what an IBM 1401
is, let alone needing you to write Unicode support for it.

If I am allowed to introduce one more anachronism, there exist a concept
called "portability". So, once one of these nutshell implementations of
Unicode exists (on, say, a DOS box with a bitmapped font), it would not be
necessary to re-write it from scratch for each next "end-of-lifed
unsupported OS's" or embedded device.

I hope this may cast a slightly different light on the effort-to-usefulness
ratio of this.

> Can you imagine a Unicode 3.1 character properties table that
> uses 16bit addressing?

I am not sure what you mean but, yes, I can imagine it very well.

But it would be an unnecessary waste to load the whole databases in memory,
although it would be possible: the vers. 3.1 character properties contains
only about 13,000 lines. Multiply this by the 32-bits of a DOS "far
pointer", and you obtain an array that still fits in a 64KB segment. OK:
this array would crash as soon as 3,000 more characters are added to
Unicode...

But loading whole tables (or fonts) in memory is not really the way to go;
you wouldn't do this even in much more powerful environments. It would be
much better to keep the data on a file and access it through an efficient
file indexing method and a well-tuned cache algorithm.

> Unicode take lots of memory.

I promise that I won't use the word "myth" for at least a week.

But my impression is that it is rather systems like OpenType and ATSUI that
take lots of memory. And this is not a surprise nor a scandal: these systems
are designed for OS's that require lots of memory for *everything*.

But this should not draw us to the conclusion that Unicode itself is a
memory-eating monster. It is just a character set! The memory and storage
requirements of Unicode are not so terribly more complex than, say, older
double byte systems.

_ Marco
OT Porting to older OSes was RE: Latin w/ diacritics (was Re: benefits of unicode)

Reply via email to