Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-24 Thread Jonas Maebe
vfclists . wrote: Isn't some formality in these Unicode discussions called for? Use of everyday language to express things which can only be properly expressed and tested through source code is very confusing. The formal definitions can be found at

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-24 Thread vfclists .
Isn't some formality in these Unicode discussions called for? Use of everyday language to express things which can only be properly expressed and tested through source code is very confusing. Consider these few sentences by Mattias It depends. There are two codepages. The real one and the

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-21 Thread Michael Schnell
On 04/20/2016 11:26 AM, Jonas Maebe wrote: The reasons are Thanks a lot or the great explanation. So there are good reasons to stay with the status-quo (unless doing a completely new versatile and straight forward String implementation that exceeds functionality and "mind" Delphi allows,

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-20 Thread Jonas Maebe
Michael Schnell wrote on Tue, 19 Apr 2016: On 04/19/2016 08:22 AM, Jonas Maebe wrote: When any {$codepage xxx} directive is specified, string constants in the source are represented in a way that makes lossless conversion to any other code page possible. This conversion to the target

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-20 Thread Michael Schnell
BTW.: http://www.freepascal.org/docs-html/rtl/system/defaultsystemcodepage.html says that DefaultSystemcodepage can be modified in the user code at runtime. I suppose that will change the way strings with StringCodePage() = CP_ACP are handled. I'll do some tests... -Michael

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-20 Thread Michael Schnell
On 04/19/2016 08:22 AM, Jonas Maebe wrote: When any {$codepage xxx} directive is specified, string constants in the source are represented in a way that makes lossless conversion to any other code page possible. This conversion to the target code page is performed at compile time where

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-20 Thread Michael Schnell
On 04/19/2016 08:22 AM, Jonas Maebe wrote: No, it does not. Please tell me which sentence of http://wiki.freepascal.org/FPC_Unicode_support#String_constants suggests that in any way. I just was making fun of myself, naively supposing the contrary :-) ;-) -Michael

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-19 Thread Jonas Maebe
Michael Schnell wrote: On 04/16/2016 11:02 AM, Mattias Gaertner wrote: For instance using {$codepage utf8} tells the compiler to convert all your literals to UTF-16. Without the {$codepage} the compiler preserves the real codepage. I.e. (compiling in a UTF-8 based Linux) - using {$codepage

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-19 Thread Michael Schnell
On 04/16/2016 11:02 AM, Mattias Gaertner wrote: StringCodePage on a literal is pretty useless. You should use StringCodePage on variables. Just exploring how the compiler works... -Michael ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-19 Thread Michael Schnell
On 04/16/2016 11:02 AM, Mattias Gaertner wrote: For instance using {$codepage utf8} tells the compiler to convert all your literals to UTF-16. Without the {$codepage} the compiler preserves the real codepage. I.e. (compiling in a UTF-8 based Linux) - using {$codepage utf8} tells the compiler

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-19 Thread Michael Schnell
On 04/16/2016 10:47 AM, Mattias Gaertner wrote: That's correct. String literals in a codepage other than system are stored as UTF-16 in the binary (Assuming with "other than system" you mean different from the DefaultSystemcodepage setting the compiler sees at it's runtime). I see. And of

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-16 Thread Jonas Maebe
Mattias Gaertner wrote: That's correct. String literals in a codepage other than system are stored as UTF-16 in the binary and converted on assign. The conversion happens at runtime, so the string codepage is decided at runtime. That's correct if the assignment is to a variable/parameter that

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-16 Thread Mattias Gaertner
On Fri, 15 Apr 2016 10:43:55 +0200 Michael Schnell wrote: >[...] > Do you suggest that the codepage of the sourcecode is preserved by the > compiler when creating the string constant in object code ? It depends. There are two codepages. The real one and the one you tell

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-16 Thread Mattias Gaertner
On Fri, 15 Apr 2016 10:19:06 +0200 Michael Schnell wrote: > On 04/15/2016 08:35 AM, Michael Van Canneyt wrote: > > > > For string constants there are slightly different rules. There the > > result depends on the {$codepage} directive of the source file. > > Hmmm. > > If

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-16 Thread Michael Schnell
On 04/15/2016 08:35 AM, Michael Van Canneyt wrote: For string constants there are slightly different rules. There the result depends on the {$codepage} directive of the source file. Hmmm. If not setting $codepage Ifor a constant string I get StringCodePage = 0, If setting {$codepage UTF8}

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-16 Thread Michael Schnell
On 04/15/2016 10:32 AM, Graeme Geldenhuys wrote: If you as a programmer knows the unit is saved in UTF-8 encoding, then add {$codepage utf8} to the top of the unit. That tells the compiler how to interpret string constants in that unit (without the need for any guessing). I did some test

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-15 Thread Graeme Geldenhuys
On 2016-04-14 09:16, Michael Schnell wrote: > For a test I did result := StringCodePage('äü'); If you as a programmer knows the unit is saved in UTF-8 encoding, then add {$codepage utf8} to the top of the unit. That tells the compiler how to interpret string constants in that unit (without the

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-15 Thread Michael Van Canneyt
On Thu, 14 Apr 2016, Michael Schnell wrote: On 04/14/2016 08:52 AM, Michael Van Canneyt wrote: The default encoding for the string type is determined at run-time, not at compile time. How can that work for string constants ? Will they in fact (virtually) change their encoding when

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-15 Thread Michael Schnell
On 04/14/2016 08:52 AM, Michael Van Canneyt wrote: The default encoding for the string type is determined at run-time, not at compile time. How can that work for string constants ? Will they in fact (virtually) change their encoding when DefaultSystemcodepage is different ? For a test I

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-14 Thread Michael Van Canneyt
On Wed, 13 Apr 2016, Michael Schnell wrote: On 04/13/2016 09:04 AM, Michael Van Canneyt wrote: It uses the DefaultSystemcodepage. If the system codepage is UTF8, then it will use UTF8. (Sorry for replying yet another answer to the same message of yours)

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-14 Thread Michael Schnell
On 04/13/2016 09:04 AM, Michael Van Canneyt wrote: It uses the DefaultSystemcodepage. If the system codepage is UTF8, then it will use UTF8. (Sorry for replying yet another answer to the same message of yours) http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus says: On the other

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-14 Thread Michael Schnell
On 04/13/2016 09:04 AM, Michael Van Canneyt wrote: It uses the DefaultSystemcodepage. If the system codepage is UTF8, then it will use UTF8. Thanks for the enlightenment. Am I right assuming that the DefaultSystemcodepage is determined when compiling the RTL and/or the compiler) ? (As the

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-13 Thread Michael Van Canneyt
On Tue, 12 Apr 2016, Michael Schnell wrote: On 04/04/2016 11:27 AM, Juha Manninen wrote: Just use the new UTF-8 mode provided by Lazarus and remove all explicit conversion functions. http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus I just did some tests and it seems that

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-13 Thread Michael Schnell
On 04/04/2016 11:27 AM, Juha Manninen wrote: Just use the new UTF-8 mode provided by Lazarus and remove all explicit conversion functions. http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus I just did some tests and it seems that TStringList (this is what Tobias is concerned about)

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-05 Thread Michael Schnell
On 04/04/2016 11:27 AM, Juha Manninen wrote: On Mon, Apr 4, 2016 at 11:18 AM, wrote: I use TStringList for UTF-8 strings. This is no longer possible, because automatic conversions cause question marks and data loss. You are completely lost with this issue. The

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-05 Thread Michael Schnell
On 04/04/2016 10:43 AM, tobiasgie...@gmail.com wrote: "Unicode aware Pascal code needs to set DefaultSystemCodePage to CP_UTF8". That can't be this ubiquitous. I do suppose that the default value is supposed to make sense in many cases. OTOH, if - as you seem to suggest - there is any

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 13:15, Sven Barth wrote: > Qt uses UTF-16 as well... I always thought that strange. After all, Qt was born as a Unix-type GUI toolkit. Unless I got my facts wrong. Then again, it's only in recent years that Unix-like systems moved to UTF-8. I think even FreeBSD didn't use UTF-8 out

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Sven Barth
Am 04.04.2016 13:21 schrieb "Graeme Geldenhuys" < mailingli...@geldenhuys.co.uk>: > > On 2016-04-04 12:06, Michael Van Canneyt wrote: > > 1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString. > > On Windows, UnicodeString is more 'natural' or 'native'. > > Based on

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, Jonas Maebe wrote: Michael Van Canneyt wrote on Mon, 04 Apr 2016: On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: [add LCL UTF-8 helper units to FPC] Though it could probably be added as quick as in FPC 3.0.2. It's simply two new units that need to be explicitly used by

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: On 2016-04-04 12:06, Michael Van Canneyt wrote: 1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString. On Windows, UnicodeString is more 'natural' or 'native'. Based on Internet standards and most popular OSes (mobile

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 12:06, Michael Van Canneyt wrote: > 1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString. > On Windows, UnicodeString is more 'natural' or 'native'. Based on Internet standards and most popular OSes (mobile devices included), UTF-8 is kind - so we all know

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: more complete solution for UTF-8. This is useful for many users. They don't have to reinvent the wheel. Not having looked at the two units you mentioned... but if this is a general requirement for anybody using UTF-8 or similar with FPC 3.0, then

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 11:34, Mattias Gaertner wrote: > for that. In fact you don't have to use LazUtils: some users simply > copied the two units FPCAdds and LazUTF8. It's all open source. This was not made clear until you explicitly mentioned it. Juha's initial comment was vague on the matter, and the

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 11:40, Mattias Gaertner wrote: > Or simply copy the two units FPCAdds, LazUTF-8 or parts of them from > here: Thank you Juha and Mattias - I'll take a look at those to see what they do. Regards, - Graeme - ___ fpc-pascal maillist -

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 13:27:05 +0300 Juha Manninen wrote: >[...] > But yes, it requires Lazarus IDE because LazUtils is a Lazarus > package. At least you must create and compile the project using > Lazarus IDE. Or simply copy the two units FPCAdds, LazUTF-8 or parts of

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 10:52:20 +0100 Graeme Geldenhuys wrote: > On 2016-04-04 10:27, Juha Manninen wrote: > > Just use the new UTF-8 mode provided by Lazarus and remove all > > explicit conversion functions. > > This is the FPC mailing list. Not everybody here uses

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Juha Manninen
On Mon, Apr 4, 2016 at 12:52 PM, Graeme Geldenhuys wrote: > This is the FPC mailing list. Not everybody here uses Lazarus or LCL, so > making such a suggestion is wishful thinking. For example, your > suggestion means nothing to me, I don't use LCL. Yes, I should

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:27, Juha Manninen wrote: > Just use the new UTF-8 mode provided by Lazarus and remove all > explicit conversion functions. This is the FPC mailing list. Not everybody here uses Lazarus or LCL, so making such a suggestion is wishful thinking. For example, your suggestion means

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Juha Manninen
On Mon, Apr 4, 2016 at 11:18 AM, wrote: > I use TStringList for UTF-8 strings. This is no longer possible, because > automatic conversions cause question marks and data loss. You are completely lost with this issue. The automatic conversion of encodings is a big step

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Jonas Maebe
tobiasgiesen wrote on Mon, 04 Apr 2016: That please update the wiki - it is user editable. Done: http://wiki.freepascal.org/FPC_Unicode_support#Backward_compatibility I hope this is correct. It is incorrect in the sense that there is nothing utf8-specific about the way your code

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread tobiasgiesen
> That please update the wiki - it is user editable. Done: http://wiki.freepascal.org/FPC_Unicode_support#Backward_compatibility I hope this is correct. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 09:43, tobiasgie...@gmail.com wrote: > Very theoretical. What you really need to tell > people is something like this: That please update the wiki - it is user editable. Even a seasoned developers as myself still needs to get my head around all this FPC Unicode stuff. So any

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread tobiasgiesen
> > I use TStringList for UTF-8 strings. This is no longer possible, because > > automatic conversions cause question marks and data loss. > > Lazarus uses TStringList with UTF-8 all over the place. > > Please post a complete example demonstrating the problem. Sorry - this was only theoretical,

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: Hello, disallowing "AnsiString" code for UTF-8 is a huge regression. I use TStringList for UTF-8 strings. This is no longer possible, because automatic conversions cause question marks and data loss. Same answer as in my other mail. Set

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Mattias Gaertner
On Mon, 04 Apr 2016 10:18:18 +0200 tobiasgie...@gmail.com wrote: > Hello, > > disallowing "AnsiString" code for UTF-8 is a huge regression. > > I use TStringList for UTF-8 strings. This is no longer possible, because > automatic conversions cause question marks and data loss. Lazarus uses

[fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread tobiasgiesen
Hello, disallowing "AnsiString" code for UTF-8 is a huge regression. I use TStringList for UTF-8 strings. This is no longer possible, because automatic conversions cause question marks and data loss. I also use a large amount of third-party libraries that use the AnsiString data type for UTF-8.