Re: [fpc-devel] TRegistry and Unicode

2019-02-26 Thread Michael Van Canneyt



On Tue, 26 Feb 2019, Marco van de Voort wrote:



Op 2/25/2019 om 9:27 PM schreef Michael Van Canneyt:

 I'm currently involved in some TRegistry bugs and regressions.

Personally I don't use TRegistry in any of my programs.
Also I mostly use Lazarus, so most most of the issues don't affect me.

However I would like to share som observations and thoughts.

TRegistry on Windows now (3.2+) uses Unicode API.
String input parameters in the various methods get "promoted" to
Unicode and then the API is called.
Returned string values however are mostly encode in UTF8, by
explicitely calling Utf8Encode(SomeUnicodeString).
Is that (enforce UTF8 encoding) by design?
(The Ansi to Unicode was done via UTF8Decode which is definitively
wrong and is fixed by now.)

On Lazarus, this no problem, since by default all strings are UTF8
encoded, so all conversions are lossless.


I think Lazarus users are the main TRegistry users, so I would keep 
current

behaviour for the public API. Where possible add overloads that use a
unicodestring, and let the UTF8 one call the unicode one.


The current situation does not improve anything for Lazarus users that 
set the default encoding to utf8 (aka utf8hack)


If I look into e.g. registry.pp, the only use of utf8encode there is  
like this:


var  s : string;

   u:unicodestring;

s:=utf8encode(u);

which, IF lazarus is used in the default utf8 mode is equivalent to


s:=u;

 So currently this utf8encode only frustrates the situation for people 
that don't set the default codepage to utf8?


If I'm wrong, what is the exact behaviour that you want to keep?


If I understood the OP correct, he wants to change the use of "string"
arguments in the public API to unicodestring.

That changes a lot.

Contrary to popular belief, the conversion will not automatically be
correct, and will produce errors.

(See e.g. https://bugs.freepascal.org/view.php?id=35113
for a similar situation where part of the error is that the lazarus
user must explicitly call Utf8Decode.)

So my proposal is to leave the public API as-is, using string, adding
unicode string overloads where possible/useful.

Internally, convert to whatever fits best.

if the internal routines are easier to maintain/understand if they use
unicode string throughout: refactor them to use unicode.

Michael.___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Good timing metric test program?

2019-02-26 Thread Arnaud Bouchez

Gareth, I like very much what you do about compiler optimizations.

From my point of view, execution speed is the most valuable, but of course 17% 
of compilation speed increase is worth it!

If you want to have a big test case, and potentially find regressions, you may 
try the TestSQL3 project of our Open Source mORMot. It makes millions of 
checks, and there are timing of individual test cases available in the console 
log, for comparison.

See https://github.com/synopse/mORMot/blob/master/SQLite3/TestSQL3.lpi

How to setup mORMot is detailed in 
https://synopse.info/files/html/Synopse%20mORMot%20Framework%20SAD%201.18.html#TITL_125

If you find only some part of it interesting for your performance tests 
(perhaps the SQLite3 tests, or the encryption performance written in asm won't 
benefit from FPC optimization work), just ask and I could extract some test 
cases for your purpose.

Great work!
Arnaud

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] TRegistry and Unicode

2019-02-26 Thread Yuriy Sydorov

On 25.02.2019 20:03, Bart wrote:

On Lazarus, this no problem, since by default all strings are UTF8
encoded, so all conversions are lossless.

In a plain fpc program though on Windows, default encoding is the
current codepage (cp1252 in my case) and information will get lost
when you process the result further.


If you just recompile unmodified non-unicode app for Windows, string data loss is imminent if the data is not encoded 
using the current ANSI encoding. Even if you use UnicodeString parameters and result for registry functions.


Just adding a single line to the beginning of the app will solve the data loss:
DefaultSystemCodePage:=CP_UTF8;

I would leave the registry parameters and results as "string" as it is done in 
other parts of RTL/FCL.
For performance critical functions it is possible to add an overloaded versions of functions which accepts UnicodeString 
parameters. IMO the registry access is not the time critical.


Yuriy.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] TRegistry and Unicode

2019-02-26 Thread Michael Van Canneyt



On Tue, 26 Feb 2019, Yuriy Sydorov wrote:


On 25.02.2019 20:03, Bart wrote:

On Lazarus, this no problem, since by default all strings are UTF8
encoded, so all conversions are lossless.

In a plain fpc program though on Windows, default encoding is the
current codepage (cp1252 in my case) and information will get lost
when you process the result further.


If you just recompile unmodified non-unicode app for Windows, string data 
loss is imminent if the data is not encoded 
using the current ANSI encoding. Even if you use UnicodeString parameters and 
result for registry functions.


Just adding a single line to the beginning of the app will solve the data 
loss:

DefaultSystemCodePage:=CP_UTF8;

I would leave the registry parameters and results as "string" as it is done 
in other parts of RTL/FCL.
For performance critical functions it is possible to add an overloaded 
versions of functions which accepts UnicodeString 
parameters. IMO the registry access is not the time critical.


My proposal exactly.

But inner workings can be made to use Unicode, because the underlying APIs
are using unicode: The *W registry calls on windows, XML DOM on other systems.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] TRegistry and Unicode

2019-02-26 Thread Marco van de Voort


Op 2/26/2019 om 11:04 AM schreef Michael Van Canneyt:



If I understood the OP correct, he wants to change the use of "string"
arguments in the public API to unicodestring.

That changes a lot.

Contrary to popular belief, the conversion will not automatically be
correct, and will produce errors.

(See e.g. https://bugs.freepascal.org/view.php?id=35113
for a similar situation where part of the error is that the lazarus
user must explicitly call Utf8Decode.)


(seems to indicate that that is utf8string, and not "string" with 
lazarus defaultsystemcodepage to utf8. I can imagine anything


wrong inserting a conversion over "string" and thus mangling the result)



So my proposal is to leave the public API as-is, using string, adding
unicode string overloads where possible/useful.

Internally, convert to whatever fits best.

if the internal routines are easier to maintain/understand if they use
unicode string throughout: refactor them to use unicode.


Leave as is:  only works with lazarus hack. Other people wo

upgrade to unicodestring :works for both cases (regardless of 
defaultsystemcodepage)




___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Good timing metric test program?

2019-02-26 Thread J. Gareth Moreton
 Speaking of the optimiser overhaul, what are timings like for others?

 Gareth aka. Kit
  ___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] TRegistry and Unicode

2019-02-26 Thread Marco van de Voort


Op 2/25/2019 om 9:27 PM schreef Michael Van Canneyt:

 I'm currently involved in some TRegistry bugs and regressions.

Personally I don't use TRegistry in any of my programs.
Also I mostly use Lazarus, so most most of the issues don't affect me.

However I would like to share som observations and thoughts.

TRegistry on Windows now (3.2+) uses Unicode API.
String input parameters in the various methods get "promoted" to
Unicode and then the API is called.
Returned string values however are mostly encode in UTF8, by
explicitely calling Utf8Encode(SomeUnicodeString).
Is that (enforce UTF8 encoding) by design?
(The Ansi to Unicode was done via UTF8Decode which is definitively
wrong and is fixed by now.)

On Lazarus, this no problem, since by default all strings are UTF8
encoded, so all conversions are lossless.


I think Lazarus users are the main TRegistry users, so I would keep 
current

behaviour for the public API. Where possible add overloads that use a
unicodestring, and let the UTF8 one call the unicode one.


The current situation does not improve anything for Lazarus users that 
set the default encoding to utf8 (aka utf8hack)


If I look into e.g. registry.pp, the only use of utf8encode there is  
like this:


var  s : string;

   u:unicodestring;

s:=utf8encode(u);

which, IF lazarus is used in the default utf8 mode is equivalent to


s:=u;

 So currently this utf8encode only frustrates the situation for people 
that don't set the default codepage to utf8?


If I'm wrong, what is the exact behaviour that you want to keep?

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] TRegistry and Unicode

2019-02-26 Thread Michael Van Canneyt



On Tue, 26 Feb 2019, Marco van de Voort wrote:



Op 2019-02-26 om 19:27 schreef Mattias Gaertner via fpc-devel:

Perhaps it would be better to change TXmlRegistry to Unicodetring?

I think that would be best, yes.


That was what I meant with 'switch the internals to unicodestring'

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] TRegistry and Unicode

2019-02-26 Thread Bart
On Tue, Feb 26, 2019 at 2:12 PM Michael Van Canneyt
 wrote:

> But inner workings can be made to use Unicode, because the underlying APIs
> are using unicode: The *W registry calls on windows, XML DOM on other systems.

Well, my argument is that since we interface explicitely with a
UnicodeString API, then why not expose that to the programmer?

-- 
Bart
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] TRegistry and Unicode

2019-02-26 Thread Mattias Gaertner via fpc-devel
On Tue, 26 Feb 2019 19:14:41 +0100
Bart  wrote:

> On Tue, Feb 26, 2019 at 2:12 PM Michael Van Canneyt
>  wrote:
> 
> > But inner workings can be made to use Unicode, because the
> > underlying APIs are using unicode: The *W registry calls on
> > windows, XML DOM on other systems.  
> 
> Well, my argument is that since we interface explicitely with a
> UnicodeString API, then why not expose that to the programmer?

Note: TRegistry uses only on Windows a UTF-16 API.
On non Windows systems it uses TXmlRegistry, which uses AnsiString and
internally uses TXMLDocument, which uses UnicodeString.

Perhaps it would be better to change TXmlRegistry to Unicodetring?

Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] TRegistry and Unicode

2019-02-26 Thread Bart
On Tue, Feb 26, 2019 at 11:04 AM Michael Van Canneyt
 wrote:


> If I understood the OP correct, he wants to change the use of "string"
> arguments in the public API to unicodestring.
>
> That changes a lot.

And that's why I asked here first.

> (See e.g. https://bugs.freepascal.org/view.php?id=35113
> for a similar situation where part of the error is that the lazarus
> user must explicitly call Utf8Decode.)

In Lazarus (with UTF8hack) assigning a UnicodeString to an AnsiString
will call Utf16ToUtf8 on the implicit conversion.
So, explicitely calling Utf8Encode should not be necessary.

> So my proposal is to leave the public API as-is, using string, adding

This leaves my initial "itch": input strings are CP_ACP (so can be
anything), output strings are CP_UTF8 always.
Why do we convert:
SomeUnicodeString := SomeAnsiString (implicit conversion using
WideStringManager)
but
SomeAnsiString := Utf8Encode(SomeUniCodeString) (explicit conversion
bypassing WideStringManager)

IMHO this is rather inconsistent and it makes no sense from the
viewpoint of "pure" freepascal users.
(Again: Lazarus users don't care one way or the other.)

E.g. compare that to FindFirst with AnsiString: it implicitely
converts Ansi- to UnicodeString and lets the WidestringManager handle
the conversion back to AnsiString.

> unicode string overloads where possible/useful.

That would mean overloading almost all methods.

Bart
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] TRegistry and Unicode

2019-02-26 Thread Marco van de Voort


Op 2019-02-26 om 19:27 schreef Mattias Gaertner via fpc-devel:

Perhaps it would be better to change TXmlRegistry to Unicodetring?

I think that would be best, yes.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Good timing metric test program?

2019-02-26 Thread J. Gareth Moreton
 Thanks George,
 I've finished and debugged my optimizer overhaul, although performance
varies.  It is predominantly faster than the existing peephole optimizer,
but not always as fast as I'd like (no more than a few seconds).  I figure
I might have introduced one or two bottlenecks during my debugging.  I'll
probably need some kind of profiling tool to figure out where the slowdowns
are, but I'm just glad that I've got it working at last!

 If people are willing to make -O1 and -O2 slightly worse when it comes to
optimisation, or at least -O1, then I can make it faster still, but one of
my philosophies with this overhaul was not to make the compiled code any
worse on -O1 and -O2... only equal or better.

 Gareth aka. Kit

 On Tue 26/02/19 12:46 , "George Bakhtadze" armorcava...@yandex.com sent:
 Gareth, 

 First of all, thanks for working on compiler optimizations. I think it's
very important. 

 As of benchmark - there is a simple 3D ray tracer benchmark written on
several languages including Pascal. 
 AFAIR Pascal version almost as fast as Java one and slightly faster than
Javascript. 
 There is forum topic about it with source code: 
 http://forum.lazarus.freepascal.org/index.php/topic%2C35700.0.html
[1]">http://forum.lazarus.freepascal.org/index.php/topic,35700.0.html 
 There are also some optimizations made by hand e.g. loop unrolling etc. It
may help to analyze optimizer results. 

 --- 
 Best Regards, George 

 25.02.2019, 17:54, "J. Gareth Moreton" : 
 > Given my recent work with the peephole optimizer, one thing that sprung
to mind is that I don't have a project that tests for performance gains in
a 'real world' program, where little optimisations add up over time. 
Given that my x86-64 optimizer overhaul is rather substantial and makes a
lot of improvements when it comes to conditional jumps and code efficiency,
is there a benchmark that could be used to show the performance improvement
compared to the trunk?  There are small ones that test individual
components, but nothing substantially large that I'm aware of. 

 

Links:
--
[1] http://forum.lazarus.freepascal.org/index.php/topic%2C35700.0.html
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel