Walter Bright:
> The problem with dchar's is strings of them consume 
> memory at a prodigious rate.

Warning: lazy musings ahead.

I hope we'll soon have computers with 200+ GB of RAM where using strings that 
use less than 32-bit chars is in most cases a premature optimization (like 
today is often a silly optimization to use arrays of 16-bit ints instead of 
32-bit or 64-bit ints. Only special situations found with the profiler can 
justify the use of arrays of shorts in a low level language).

Even in PCs with 200 GB of RAM the first levels of CPU caches can be very small 
(like 32 KB), and cache misses are costly, so even if huge amounts of RAMs are 
present, to increase performance it can be useful to reduce the size of strings.

A possible solution to this problem can be some kind of real-time hardware 
compression/decompression between the CPU and the RAM. UTF-8 can be a good 
enough way to compress 32-bit strings. So we are back to writing low-level 
programs that have to deal with UTF-8.

To avoid this, CPUs and RAM can compress/decompress the text transparently to 
the programmer. Unfortunately UTF-8 is a variable-length encoding, so maybe it 
can't be done transparently enough. So a smarter and better compression 
algorithm can be used to keep all this transparent enough (not fully 
transparent, some low-level situations can require code that deals with the 
compression).

Bye,
bearophile

Reply via email to