On Tue, 26 Feb 2019, Marco van de Voort wrote:


Op 2/25/2019 om 9:27 PM schreef Michael Van Canneyt:
 I'm currently involved in some TRegistry bugs and regressions.
Personally I don't use TRegistry in any of my programs.
Also I mostly use Lazarus, so most most of the issues don't affect me.

However I would like to share som observations and thoughts.

TRegistry on Windows now (3.2+) uses Unicode API.
String input parameters in the various methods get "promoted" to
Unicode and then the API is called.
Returned string values however are mostly encode in UTF8, by
explicitely calling Utf8Encode(SomeUnicodeString).
Is that (enforce UTF8 encoding) by design?
(The Ansi to Unicode was done via UTF8Decode which is definitively
wrong and is fixed by now.)

On Lazarus, this no problem, since by default all strings are UTF8
encoded, so all conversions are lossless.

I think Lazarus users are the main TRegistry users, so I would keep current
behaviour for the public API. Where possible add overloads that use a
unicodestring, and let the UTF8 one call the unicode one.

The current situation does not improve anything for Lazarus users that set the default encoding to utf8 (aka utf8hack)

If I look into e.g. registry.pp, the only use of utf8encode there is  like this:

var  s : string;

       u:unicodestring;

s:=utf8encode(u);

which, IF lazarus is used in the default utf8 mode is equivalent to


s:=u;

 So currently this utf8encode only frustrates the situation for people that don't set the default codepage to utf8?

If I'm wrong, what is the exact behaviour that you want to keep?

If I understood the OP correct, he wants to change the use of "string"
arguments in the public API to unicodestring.

That changes a lot.

Contrary to popular belief, the conversion will not automatically be
correct, and will produce errors.

(See e.g. https://bugs.freepascal.org/view.php?id=35113
for a similar situation where part of the error is that the lazarus
user must explicitly call Utf8Decode.)

So my proposal is to leave the public API as-is, using string, adding
unicode string overloads where possible/useful.

Internally, convert to whatever fits best.

if the internal routines are easier to maintain/understand if they use
unicode string throughout: refactor them to use unicode.

Michael.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to