Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

Jürgen Hestermann Tue, 24 Dec 2013 03:20:19 -0800

Am 2013-12-23 23:08, schrieb Marco van de Voort:
> But if I have to chose to kill one, it is utf8. It is the lesser used choice
> for unicode strings INSIDE APPLICATIONS.  Yes, UTF8 is dominant in documents, 
but
> not in APIs.


But in APIs it would not matter much to convert (in general the time for 
conversion
is negligible compared to the time that is needed for the rest around the API 
call).

I have written a file manager for Windows that can log and store millions of 
files in memory.
It uses the (UTF16) unicode API from Windows and converts the file names as 
UTF8 internally.
There exists another file manager who uses UTF16 internally too which can also 
log millions of files.
When logging the same source I can't see any difference in performance (even 
when logging
multiple times so that everything is cached!) although I have to convert and 
the other one does not.
But the memory footprints are very different.


>> UTF16 is the most horrible decision (all bad things combined).
> For what? Most of the sentiments I hear are echoed discussions on the web
> that are mostly about document encodings, NOT application internal
> encodings.

IMO this decision is based on the assumption to choose one encoding for 
everything.
So the same encoding is used *everywhere* as much as possible.
Then UTF8 is the best solution.
Why use UTF16/32? They cannot be treated the same as ancient ANSI strings 
either.
So what would be the reason behind it? Just wasting memory?


>> UTF8 has the lowest memory demand
> Not according to 1 billion Chinese.

How many of the strings stored and processed on a chinese computer are in 
chinese language?
A lot of the strings are still in english (HTML etc.).
So for asian countries the real memory demand is a mix and is not so easy to 
determine.
In most western countries UTF8 definitely uses less memory.


>> On the other hand, adapting the string encoding for each
>> Widgetset/OS would be a can of worms IMO.
> If you feel that way, I think Delphi compatibility should prevail.

Why this?
Free Pascal/Lazarus should fledge and not repeat all the bad decissions of 
Borland/Embarcadero/..


> Note that the language support for utf8 breaks down when you pass e.g. a
> "string" to rawbytestring on Windows. (because it is converted to the
> default 1-byte encoding, which is not utf8 in general).

I am not sure what you are talking about here.
For Windows I would use the unicode (UTF16) API interface exclusively and
convert it to UTF8 internally. From then on, everything should be UTF8.


> As said, UTF8 on Windows is a crutch, and attempts to workaround that moves
> Lazarus in the direction of "portability to everything as long as it is
> unix" philosophies, a la Cygwin.

For me the decision of what Unicode encoding should be used is primary OS 
independent.
Just do the conversion once at the API interface level but then use internal 
what was
decided to be "the best" (UTF8 IMO). Conversions seem to be unavoidable anyway.
So it is just a decision where and when they take place.
And the API level is a good place IMO.
And when other OS's use the same encoding it is even better but not the reason 
to chose one or the other.


>> A lot of additional knowledge about strings is put on the programmer
>> because handling of strings has to be done differently depending on OS.

No!. That's just the aim: If *all* Free Pascal/Lazarus programmers can rely on 
having
UTF8 in all cases then you only need to handle UTF8 strings.
No IFDEFS to handle UTF16 on Windows and UTF8 on Linux.
The same code just works on *all* platforms!


> Constructs that happen to work with Linux will fail on Windows.
> Because on Windows the default 1-byte encoding is not UTF8.

The ANSI interface should not be used anymore. It is obsolete and only needed
for ancient OS's like DOS. But programmers should not be encourraged to use it
on modern platforms. Just use UTF8 *everywhere*. That should be the aim IMO.


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

Reply via email to