Op Sun, 23 Nov 2008, schreef listmember:

What I had in mind wasn't to store the string data in UTF-32 (or UCS-4); it would still be UTF-8 or whatever.

I am only considering in memory representation being UTF-32 (or UCS-4).

This way, loading from and saving to would hardly be affected, yet in-memory operations would be a lot faster and more simplified.

For source code, en extended ASCII charset like UTF-8 is the best choice, since all characters that need processing are in the ASCII range, the code needs to do nothing about the high ASCII codes except keeping them in one part.

Therefore, any other encoding is a waste of memory and does not gain you any speed. For that reason, I don't see the compiler switch from 8-bit processing either.

The situation is very different when processing real text, the memory saving advantages dissappear for the majority of the world, and if you want to process characters beyond #127, UTF-16 and UTF-32 are much easier. Obviously, UTF-32 is the best encoding if there are characters you need to process are beyond #65535.

Only if you need to process characters (rather than pass them on), UTF-32 is a lot faster and simpler.

Daniël
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to