Re: [fpc-devel] TRegistry and Unicode
On Tue, 26 Feb 2019, Marco van de Voort wrote: Op 2/25/2019 om 9:27 PM schreef Michael Van Canneyt: I'm currently involved in some TRegistry bugs and regressions. Personally I don't use TRegistry in any of my programs. Also I mostly use Lazarus, so most most of the issues don't affect me. However I would like to share som observations and thoughts. TRegistry on Windows now (3.2+) uses Unicode API. String input parameters in the various methods get "promoted" to Unicode and then the API is called. Returned string values however are mostly encode in UTF8, by explicitely calling Utf8Encode(SomeUnicodeString). Is that (enforce UTF8 encoding) by design? (The Ansi to Unicode was done via UTF8Decode which is definitively wrong and is fixed by now.) On Lazarus, this no problem, since by default all strings are UTF8 encoded, so all conversions are lossless. I think Lazarus users are the main TRegistry users, so I would keep current behaviour for the public API. Where possible add overloads that use a unicodestring, and let the UTF8 one call the unicode one. The current situation does not improve anything for Lazarus users that set the default encoding to utf8 (aka utf8hack) If I look into e.g. registry.pp, the only use of utf8encode there is like this: var s : string; u:unicodestring; s:=utf8encode(u); which, IF lazarus is used in the default utf8 mode is equivalent to s:=u; So currently this utf8encode only frustrates the situation for people that don't set the default codepage to utf8? If I'm wrong, what is the exact behaviour that you want to keep? If I understood the OP correct, he wants to change the use of "string" arguments in the public API to unicodestring. That changes a lot. Contrary to popular belief, the conversion will not automatically be correct, and will produce errors. (See e.g. https://bugs.freepascal.org/view.php?id=35113 for a similar situation where part of the error is that the lazarus user must explicitly call Utf8Decode.) So my proposal is to leave the public API as-is, using string, adding unicode string overloads where possible/useful. Internally, convert to whatever fits best. if the internal routines are easier to maintain/understand if they use unicode string throughout: refactor them to use unicode. Michael.___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Good timing metric test program?
Gareth, I like very much what you do about compiler optimizations. From my point of view, execution speed is the most valuable, but of course 17% of compilation speed increase is worth it! If you want to have a big test case, and potentially find regressions, you may try the TestSQL3 project of our Open Source mORMot. It makes millions of checks, and there are timing of individual test cases available in the console log, for comparison. See https://github.com/synopse/mORMot/blob/master/SQLite3/TestSQL3.lpi How to setup mORMot is detailed in https://synopse.info/files/html/Synopse%20mORMot%20Framework%20SAD%201.18.html#TITL_125 If you find only some part of it interesting for your performance tests (perhaps the SQLite3 tests, or the encryption performance written in asm won't benefit from FPC optimization work), just ask and I could extract some test cases for your purpose. Great work! Arnaud ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TRegistry and Unicode
On 25.02.2019 20:03, Bart wrote: On Lazarus, this no problem, since by default all strings are UTF8 encoded, so all conversions are lossless. In a plain fpc program though on Windows, default encoding is the current codepage (cp1252 in my case) and information will get lost when you process the result further. If you just recompile unmodified non-unicode app for Windows, string data loss is imminent if the data is not encoded using the current ANSI encoding. Even if you use UnicodeString parameters and result for registry functions. Just adding a single line to the beginning of the app will solve the data loss: DefaultSystemCodePage:=CP_UTF8; I would leave the registry parameters and results as "string" as it is done in other parts of RTL/FCL. For performance critical functions it is possible to add an overloaded versions of functions which accepts UnicodeString parameters. IMO the registry access is not the time critical. Yuriy. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TRegistry and Unicode
On Tue, 26 Feb 2019, Yuriy Sydorov wrote: On 25.02.2019 20:03, Bart wrote: On Lazarus, this no problem, since by default all strings are UTF8 encoded, so all conversions are lossless. In a plain fpc program though on Windows, default encoding is the current codepage (cp1252 in my case) and information will get lost when you process the result further. If you just recompile unmodified non-unicode app for Windows, string data loss is imminent if the data is not encoded using the current ANSI encoding. Even if you use UnicodeString parameters and result for registry functions. Just adding a single line to the beginning of the app will solve the data loss: DefaultSystemCodePage:=CP_UTF8; I would leave the registry parameters and results as "string" as it is done in other parts of RTL/FCL. For performance critical functions it is possible to add an overloaded versions of functions which accepts UnicodeString parameters. IMO the registry access is not the time critical. My proposal exactly. But inner workings can be made to use Unicode, because the underlying APIs are using unicode: The *W registry calls on windows, XML DOM on other systems. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TRegistry and Unicode
Op 2/26/2019 om 11:04 AM schreef Michael Van Canneyt: If I understood the OP correct, he wants to change the use of "string" arguments in the public API to unicodestring. That changes a lot. Contrary to popular belief, the conversion will not automatically be correct, and will produce errors. (See e.g. https://bugs.freepascal.org/view.php?id=35113 for a similar situation where part of the error is that the lazarus user must explicitly call Utf8Decode.) (seems to indicate that that is utf8string, and not "string" with lazarus defaultsystemcodepage to utf8. I can imagine anything wrong inserting a conversion over "string" and thus mangling the result) So my proposal is to leave the public API as-is, using string, adding unicode string overloads where possible/useful. Internally, convert to whatever fits best. if the internal routines are easier to maintain/understand if they use unicode string throughout: refactor them to use unicode. Leave as is: only works with lazarus hack. Other people wo upgrade to unicodestring :works for both cases (regardless of defaultsystemcodepage) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Good timing metric test program?
Speaking of the optimiser overhaul, what are timings like for others? Gareth aka. Kit ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TRegistry and Unicode
Op 2/25/2019 om 9:27 PM schreef Michael Van Canneyt: I'm currently involved in some TRegistry bugs and regressions. Personally I don't use TRegistry in any of my programs. Also I mostly use Lazarus, so most most of the issues don't affect me. However I would like to share som observations and thoughts. TRegistry on Windows now (3.2+) uses Unicode API. String input parameters in the various methods get "promoted" to Unicode and then the API is called. Returned string values however are mostly encode in UTF8, by explicitely calling Utf8Encode(SomeUnicodeString). Is that (enforce UTF8 encoding) by design? (The Ansi to Unicode was done via UTF8Decode which is definitively wrong and is fixed by now.) On Lazarus, this no problem, since by default all strings are UTF8 encoded, so all conversions are lossless. I think Lazarus users are the main TRegistry users, so I would keep current behaviour for the public API. Where possible add overloads that use a unicodestring, and let the UTF8 one call the unicode one. The current situation does not improve anything for Lazarus users that set the default encoding to utf8 (aka utf8hack) If I look into e.g. registry.pp, the only use of utf8encode there is like this: var s : string; u:unicodestring; s:=utf8encode(u); which, IF lazarus is used in the default utf8 mode is equivalent to s:=u; So currently this utf8encode only frustrates the situation for people that don't set the default codepage to utf8? If I'm wrong, what is the exact behaviour that you want to keep? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TRegistry and Unicode
On Tue, 26 Feb 2019, Marco van de Voort wrote: Op 2019-02-26 om 19:27 schreef Mattias Gaertner via fpc-devel: Perhaps it would be better to change TXmlRegistry to Unicodetring? I think that would be best, yes. That was what I meant with 'switch the internals to unicodestring' Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TRegistry and Unicode
On Tue, Feb 26, 2019 at 2:12 PM Michael Van Canneyt wrote: > But inner workings can be made to use Unicode, because the underlying APIs > are using unicode: The *W registry calls on windows, XML DOM on other systems. Well, my argument is that since we interface explicitely with a UnicodeString API, then why not expose that to the programmer? -- Bart ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TRegistry and Unicode
On Tue, 26 Feb 2019 19:14:41 +0100 Bart wrote: > On Tue, Feb 26, 2019 at 2:12 PM Michael Van Canneyt > wrote: > > > But inner workings can be made to use Unicode, because the > > underlying APIs are using unicode: The *W registry calls on > > windows, XML DOM on other systems. > > Well, my argument is that since we interface explicitely with a > UnicodeString API, then why not expose that to the programmer? Note: TRegistry uses only on Windows a UTF-16 API. On non Windows systems it uses TXmlRegistry, which uses AnsiString and internally uses TXMLDocument, which uses UnicodeString. Perhaps it would be better to change TXmlRegistry to Unicodetring? Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TRegistry and Unicode
On Tue, Feb 26, 2019 at 11:04 AM Michael Van Canneyt wrote: > If I understood the OP correct, he wants to change the use of "string" > arguments in the public API to unicodestring. > > That changes a lot. And that's why I asked here first. > (See e.g. https://bugs.freepascal.org/view.php?id=35113 > for a similar situation where part of the error is that the lazarus > user must explicitly call Utf8Decode.) In Lazarus (with UTF8hack) assigning a UnicodeString to an AnsiString will call Utf16ToUtf8 on the implicit conversion. So, explicitely calling Utf8Encode should not be necessary. > So my proposal is to leave the public API as-is, using string, adding This leaves my initial "itch": input strings are CP_ACP (so can be anything), output strings are CP_UTF8 always. Why do we convert: SomeUnicodeString := SomeAnsiString (implicit conversion using WideStringManager) but SomeAnsiString := Utf8Encode(SomeUniCodeString) (explicit conversion bypassing WideStringManager) IMHO this is rather inconsistent and it makes no sense from the viewpoint of "pure" freepascal users. (Again: Lazarus users don't care one way or the other.) E.g. compare that to FindFirst with AnsiString: it implicitely converts Ansi- to UnicodeString and lets the WidestringManager handle the conversion back to AnsiString. > unicode string overloads where possible/useful. That would mean overloading almost all methods. Bart ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TRegistry and Unicode
Op 2019-02-26 om 19:27 schreef Mattias Gaertner via fpc-devel: Perhaps it would be better to change TXmlRegistry to Unicodetring? I think that would be best, yes. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Good timing metric test program?
Thanks George, I've finished and debugged my optimizer overhaul, although performance varies. It is predominantly faster than the existing peephole optimizer, but not always as fast as I'd like (no more than a few seconds). I figure I might have introduced one or two bottlenecks during my debugging. I'll probably need some kind of profiling tool to figure out where the slowdowns are, but I'm just glad that I've got it working at last! If people are willing to make -O1 and -O2 slightly worse when it comes to optimisation, or at least -O1, then I can make it faster still, but one of my philosophies with this overhaul was not to make the compiled code any worse on -O1 and -O2... only equal or better. Gareth aka. Kit On Tue 26/02/19 12:46 , "George Bakhtadze" armorcava...@yandex.com sent: Gareth, First of all, thanks for working on compiler optimizations. I think it's very important. As of benchmark - there is a simple 3D ray tracer benchmark written on several languages including Pascal. AFAIR Pascal version almost as fast as Java one and slightly faster than Javascript. There is forum topic about it with source code: http://forum.lazarus.freepascal.org/index.php/topic%2C35700.0.html [1]">http://forum.lazarus.freepascal.org/index.php/topic,35700.0.html There are also some optimizations made by hand e.g. loop unrolling etc. It may help to analyze optimizer results. --- Best Regards, George 25.02.2019, 17:54, "J. Gareth Moreton" : > Given my recent work with the peephole optimizer, one thing that sprung to mind is that I don't have a project that tests for performance gains in a 'real world' program, where little optimisations add up over time. Given that my x86-64 optimizer overhaul is rather substantial and makes a lot of improvements when it comes to conditional jumps and code efficiency, is there a benchmark that could be used to show the performance improvement compared to the trunk? There are small ones that test individual components, but nothing substantially large that I'm aware of. Links: -- [1] http://forum.lazarus.freepascal.org/index.php/topic%2C35700.0.html ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel