Re: [Lazarus] cwstring in arm-linux
On 10/21/2011 10:24 PM, Hans-Peter Diettrich wrote: The bad news about new Delphi strings is the addition (overload) of functions with RawByteString arguments, which *take* strings of any encoding, but then *ignore* that encoding. These versions certainly must fail for all MBCS encodings :-( Is there an agreement on how this should work ? function parameter type Unicode string type coming in ... actual encoding ID of string coming in - conversion 1) RawByte ... any .. any - supposedly no, supposedly keeping the encoding ID 2) not Raw ... Raw ... $ (=RAW) - ??? 3) not Raw ... same ... matching - obviously No 4) not Raw ... same ... not matching (maybe $) - this is an intersexual String what to do ? 5) not Raw ... Raw ... matching to parameter type - supposedly No 6) not Raw ... Raw ... not $ but not matching the parameter type - supposedly Yes 7) not Raw ... not Raw but different ... not $ matching its own Type - supposedly Yes 8) not Raw ... not Raw but different ... not $ matching the Type of the parameter - this is an intersexual String, not converting it would cure this. 9) not Raw ... not Raw but different ... not $ matching neither - this is an intersexual String what to do ? 10) not Raw ... not Raw but different ... $- this is an intersexual String what to do ? did I forget any cases ? -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
+1 to all points. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/21/2011 11:48 PM, Hans-Peter Diettrich wrote: What if a file on the user computer has 4byte [visible] character as 8th character and you, for example want to get 8 character file name? In this case you split that 4 byte character and have garbage. For me (and for Linux) a file name does not at all consist of visible characters but is just a sequence of bytes. (AFAIK, with Ext3 any byte is allowed but Zero and /). How this byte array is presented on a screen, printed out or obtained from a keyboard is jut up to the program that communicates with the user. Thus it _might_ handle the byte string as if it would be UTF-8 (unless it does not match the appropriate rules), as locale based ANSI or whatever. For the presentation it of course needs to adhere to the API definition of the WidgetSet used. But this in fact does not have any relation to how the file system works and this what the meaning of the file name really is. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Michael Schnell schrieb: On 10/21/2011 10:24 PM, Hans-Peter Diettrich wrote: The bad news about new Delphi strings is the addition (overload) of functions with RawByteString arguments, which *take* strings of any encoding, but then *ignore* that encoding. These versions certainly must fail for all MBCS encodings :-( Is there an agreement on how this should work ? function parameter type Unicode string type coming in ... actual encoding ID of string coming in - conversion I'm not sure what you mean. Whenever a UnicodeString is passed as an argument, the UnicodeString version is called, not the RawByteString version. When only AnsiString types are passed as parameters, the RawByteString version is called, and has to deal with possibly different encodings. The Delphi implementations simply ignore any encoding, so that the results are almost unusable :-( In the AnsiStrings and StrUtils units another set of overloaded procedures is provided, for native AnsiString(CP_ACP) arguments. These versions are called only for all-native AnsiString arguments, so that no conversions are required. 1) RawByte ... any .. any - supposedly no, supposedly keeping the encoding ID 2) not Raw ... Raw ... $ (=RAW) - ??? 3) not Raw ... same ... matching - obviously No 4) not Raw ... same ... not matching (maybe $) - this is an intersexual String what to do ? [...] Please note that the RawByteString type is not an intended type for variables, only for subroutine arguments. Strings of that type are either empty, with no encoding, or they hold the last assigned string, including its encoding. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/24/2011 01:34 PM, Hans-Peter Diettrich wrote: I'm not sure what you mean. Whenever a UnicodeString is passed as an argument, the UnicodeString version is called, not the RawByteString version. I'm not speaking about any existing procedures, but those somebody can do and thus do not have overloaded versions. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/24/2011 01:34 PM, Hans-Peter Diettrich wrote: Please note that the RawByteString type is not an intended type for variables, only for subroutine arguments. I'm not speaking about wow anything is intended, but asking about a definition what the compiler is to do when these cases are detected at compile- and run-time. If it is syntactically possible there needs to be a definition on what will happen. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Hi, On 2011-10-22 00:48, Hans-Peter Diettrich wrote: Žilvinas Ledas schrieb: Hello, On 2011-10-21 10:43, Michael Schnell wrote: Of course you are right, but move and friends is hardware-near programming for this who know what they are doing. but basic (legacy) string operations like myChar := myString[i] is office-level programming and thus should work as a dummy expects. What if a file on the user computer has 4byte [visible] character as 8th character and you, for example want to get 8 character file name? In this case you split that 4 byte character and have garbage. Then you (or your boss) didn't understand the meaning of 4 characters. (Logical) characters are different from physical Chars, in every MBCS codepage. I know that logical characters are different from physical. I was trying to make a point, that even usint UTF16 you MUST check any string comming trom outside world. What it user inputs in your text field (or a command line parameter or anywhere else) a string containing 4 byte character and you split that string on that character? (For example when showing some kind of summary of his input.) Don't forget that user can input characters by copy-pasting them from the web, not only using his keyboard! See above. With proportial fonts, counting characters is a bad idea, instead the width of the displayed string (in pixels) should be used. Then you also can deal with languages and character sets, which use ligatures and the like. Even with monospaced fonts the characters (glyphs) can have a different width, in multiples of the basic width, e.g. for Chinese or other eastern character sets. So, if you want to write PROFESSIONAL software with any user input - you must handle 4 byte characters at every place you get user input. Counting characters then is a bad idea, see above. Otherwise you leave a chance to get and show to the user garbage. Is this really easier than using UTF8 everywhere? My personal experience: I am maintaining (as a hobby project) multi-language dictionary program (a screen-shoot: http://2.bp.blogspot.com/_3-IaodGIbVQ/TMHY-l9M4sI/Aak/AbtShWq0ZUQ/s1600/KZod_screen_win7.png Great :-) ) and it involves quite a bit of [multilingual] string manipulation and when I did migration from delphi to Lazarus I didn't know about requirement that all (GUI) strings must be UTF8 and I had no problems migrating! Yes, afterwards I tweaked some calls to RTL (mostly file handling) functions that expected to get ANSI encoding, but this is not a problem of UTF8, but or RTL being (mostly) ansi. From which Delphi version did you migrate? What encoding did you use in Delphi? From Delphi 5. Actually, it was quite do not remember now what I was using :) I think it was a mix of ansi/wide/utf8 strings. Regards, Žilvinas Ledas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Thu, Oct 20, 2011 at 09:55:25AM +0200, Martin Schreiber wrote: I suppose that there will be some kind of compiler switch allowing for selecting whether or not to use the new string feature. That does not change the possibility that FPC compiler, RTL and FCL will be less stable than now because of the greater complexity and because of the possible new bugs. There is a brand new fixes branch without those changes. Trunk has been broken heavily before early in a cycle. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 2011-10-20 17:30, Luca Olivetti wrote: Additionally, 16 bits is enough to cover the BMP, Basic Multilingual Plane, which encompasses the majority of today's most widely used languages. Only when you get to more advanced codepoints in some of the far-eastern languages, or are needing to encode dead languages such as Egyption hieroglyphics do you need more than 16 bits. That is such a rubbish statement! More and more information is being added outside the Unicode's BMP. Emoticons, Science and Maths symbols, Map Symbols (often seen in GPS applications), Music notes etc etc. So it's not just far-eastern or dead languages any more... Using the Supplementary Plane of Unicode will become a lot more used in the near future. So UTF-16's usage of surrogate pairs will become more common place. And this is where UTF-8 will shine once again, because nothing will need changing in the programmers code - selecting a BMP or Supplementary code point is identical. Programmers using UTF-16 often don't bother checking for surrogate pairs, treating UTF-16 like UCS2 - BIG MISTAKE! This is why I think UTF-8 is a much safer choice. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 2011-10-21 00:20, Hans-Peter Diettrich wrote: your legacy code can assume that every (visible) character is a Char, in an SBCS codepage, this is not different in UTF-16. Rookie mistake!!! You forgot surrogate pairs in UTF-16. Think outside the Unicode BMP where a visible character will be 4-bytes, thus two UTF-16 Char values. As as I mentioned earlier, most programmers using UTF-16 treat it like UCS2, forgetting that they need to check for surrogate pairs too. Now in UTF-8, this is not a problem at all. Finding a visible character in the BMP or Supplementary Plane is a identical process, no special checking is required. Thus making UTF-8 much easier and safer to use. I've ported enough Delphi code to FPC + fpGUI where UTF-8 is used for Unicode support. I fully agree with Felipe, using UTF-8 is much easier with legacy code that UTF-16. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 04:34 PM, Felipe Monteiro de Carvalho wrote: Length does not give the number of chars? No problem: As said: Of course this is no problem for those who do are aware that they are dealing with Unicode and not with displayed characters. This of course includes myself when doing new code from scratch. But this does not include the persons I mentioned before and it does not include me when porting legacy code. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 09:42 PM, Felipe Monteiro de Carvalho wrote: Changing the size of Char is not just small detail, this breaks *a lot* of code. Any kind of memory operations such as Move will fail because the char size changed. Of course you are right, but move and friends is hardware-near programming for this who know what they are doing. but basic (legacy) string operations like myChar := myString[i] is office-level programming and thus should work as a dummy expects. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/21/2011 09:03 AM, Graeme Geldenhuys wrote: You forgot surrogate pairs in UTF-16. Think outside the Unicode BMP where a visible character will be 4-bytes, thus two UTF-16 Char values. Regarding this, there seemingly is no help at all :( (I understand that even in full 32 Unicode there are such pairs, creating characters that are even maybe defined as completely different 32 Bit Unicode as well) But in fact I up til now never came across any situation requiring non-BMP encoding. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/21/2011 08:56 AM, Graeme Geldenhuys wrote: That is such a rubbish statement! More and more information is being added outside the Unicode's BMP. Emoticons, Science and Maths symbols, Map Symbols (often seen in GPS applications), Music notes etc etc. Those who deal with this of course need to know what they are doing. But those who use normal non-English (ASCII) languages (I understand that this is what BMP means) and don't want to deal with such things, are (maybe unnecessarily ) forced into hell by the UTF-8 in ANSIString paradigm . -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 05:49 PM, Mattias Gaertner wrote: Often they say: Linux has problems with unicode. Reason: teachers think that unicode is so simple under java, so they don't explain it. I see. Obviously a similar problem as with Delphi. (If E. in fact (like promised some time ago) creates a Delphi that compiles for Linux, I am curious how this is handled.) If you have students that stupid, then don't tell them about the [] operator. :) :) :) -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 10:26 PM, Felipe Monteiro de Carvalho wrote: Mac OS X uses the decomposed form in UTF-8 to store filenames, which is rather unpleasant. Why are they so silly ? -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Graeme Geldenhuys schrieb: On 2011-10-21 00:20, Hans-Peter Diettrich wrote: your legacy code can assume that every (visible) character is a Char, in an SBCS codepage, this is not different in UTF-16. Rookie mistake!!! You forgot surrogate pairs in UTF-16. Which Ansi characters translate into surrogate pairs? Now in UTF-8, this is not a problem at all. Finding a visible character in the BMP or Supplementary Plane is a identical process, no special checking is required. Thus making UTF-8 much easier and safer to use. Please specify Finding, a code snippet would be nice. I've ported enough Delphi code to FPC + fpGUI where UTF-8 is used for Unicode support. I fully agree with Felipe, using UTF-8 is much easier with legacy code that UTF-16. This only demonstrates that UTF-16 has not been supported sufficiently in FPC, until now. Give an example of UTF-8 code, which would become *more* complicated with UTF-16. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Graeme Geldenhuys schrieb: On 2011-10-20 17:30, Luca Olivetti wrote: Additionally, 16 bits is enough to cover the BMP, Basic Multilingual Plane, which encompasses the majority of today's most widely used languages. Only when you get to more advanced codepoints in some of the far-eastern languages, or are needing to encode dead languages such as Egyption hieroglyphics do you need more than 16 bits. That is such a rubbish statement! More and more information is being added outside the Unicode's BMP. Emoticons, Science and Maths symbols, Map Symbols (often seen in GPS applications), Music notes etc etc. Now also tell us how application code is affected by such astral codepoints, and how these are handled easier in UTF-8 than in UTF-16. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 2011-10-21 09:50, Michael Schnell wrote: But in fact I up til now never came across any situation requiring non-BMP encoding. We use the Science and Maths symbols define outside the BMP all the time in our products. Why use images (old school style) when font symbols (today's style) can do the exact same think, but easier. Plus you get the benefit that you can copy paste text with math or science symbols without problems. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 2011-10-21 10:19, Hans-Peter Diettrich wrote: Please specify Finding, a code snippet would be nice. Knock yourself out... https://github.com/graemeg/fpGUI/blob/master/src/corelib/fpg_stringutils.pas Take a look at UTF8Copy() or UTF8Insert() etc. in FPC, until now. Give an example of UTF-8 code, which would become *more* complicated with UTF-16. Consider a Copy() type function where you want to copy a Unicode codepoint (think single character as you see on the screen - ignoring combining diacritics for now) out from a string. UTF8Copy() as defined above will do that correctly, irrespective if the codepoint is in the BMP or Supplementary Plane or if the character is represented by 1,2,3 or 4 bytes in length. With UTF-16 you need to check if the UTF-16 string is Little Indian or Big Indian (UTF-16BE or UTF-16LE), whether the codepoint has a surrogate pair or not. All in all, a lot more complex than UTF-8. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/21/2011 10:09 AM, Graeme Geldenhuys wrote: We use the Science and Maths symbols define outside the BMP all the time in our products. So with these projects you obviously are a Unicode aware programmer and don't qualify for the group of office programmers that (IMHO) should be enabled to do theirs stuff without being forced to think about displayable character encoding stuff. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 2011-10-21 10:31, Michael Schnell wrote: So with these projects you obviously are a Unicode aware programmer and don't qualify for the group of office programmers that (IMHO) I don't have to think about anything special when working with Unicode text. I simply use the string manipulation function as defined in the fpGUI framework, and not the ones defined in FPC's RTL. The fpGUI framework handles the rest for me. I'm even considering renaming all the UTF8xxx string manipulation functions in fpGUI to something like fpg (eg: fpgCopy() or fpgInsert()) because I really don't think the programmer needs to know that fpGUI uses UTF-8 internally. If you use the string functions as defined in fpGUI, your code will work. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 2011-10-21 10:22, Hans-Peter Diettrich wrote: Now also tell us how application code is affected by such astral codepoints, and how these are handled easier in UTF-8 than in UTF-16. As to not repeat myself, see one of my other replies. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Hello, On 2011-10-21 10:43, Michael Schnell wrote: Of course you are right, but move and friends is hardware-near programming for this who know what they are doing. but basic (legacy) string operations like myChar := myString[i] is office-level programming and thus should work as a dummy expects. What if a file on the user computer has 4byte [visible] character as 8th character and you, for example want to get 8 character file name? In this case you split that 4 byte character and have garbage. What it user inputs in your text field (or a command line parameter or anywhere else) a string containing 4 byte character and you split that string on that character? (For example when showing some kind of summary of his input.) Don't forget that user can input characters by copy-pasting them from the web, not only using his keyboard! So, if you want to write PROFESSIONAL software with any user input - you must handle 4 byte characters at every place you get user input. Otherwise you leave a chance to get and show to the user garbage. Is this really easier than using UTF8 everywhere? My personal experience: I am maintaining (as a hobby project) multi-language dictionary program (a screen-shoot: http://2.bp.blogspot.com/_3-IaodGIbVQ/TMHY-l9M4sI/Aak/AbtShWq0ZUQ/s1600/KZod_screen_win7.png ) and it involves quite a bit of [multilingual] string manipulation and when I did migration from delphi to Lazarus I didn't know about requirement that all (GUI) strings must be UTF8 and I had no problems migrating! Yes, afterwards I tweaked some calls to RTL (mostly file handling) functions that expected to get ANSI encoding, but this is not a problem of UTF8, but or RTL being (mostly) ansi. Regards, Žilvinas Ledas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/21/2011 02:18 PM, Žilvinas Ledas wrote: What if a file on the user computer has... If you deal with the content of files you did not write yourself, you of course need to deal with whatever encoding same has been done in (maybe its EBCDIC :) ). This is unavoidable and if you are so unhappy that you need to consider that the file is done in Unicode, you of course need to upgrade to being a Unicode expert. But if you just deal with the user's GUI input and output and with files that you wrote yourself in some default encoding code the language tools define, IMHO a decent language should do whatever possible to hide the complexity. As said: I'm not sure to what extent this is possible and whether Delphi does a good job here, so I don't intend to question any decision done by the Lazarus team now (for FPC without new strings) or in future with FPC with whatever implementation of a new string feature). -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Am 21.10.2011 15:02 schrieb Michael Schnell: But if you just deal with the user's GUI input and output and with files that you wrote yourself in some default encoding code the language tools define, IMHO a decent language should do whatever possible to hide the complexity. You'd advocate for fpc/Lazarus to normalize all incoming and outgoing file names then? If you write a file with a file name in unicode NFC in OS X and read the file name back from the OS, you'll get a NFD string returned, which means a normalization-unaware compare function will not do what you'd expect. Michael Lutz -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Am 21.10.2011 10:00 schrieb Michael Schnell: On 10/20/2011 10:26 PM, Felipe Monteiro de Carvalho wrote: Mac OS X uses the decomposed form in UTF-8 to store filenames, which is rather unpleasant. Why are they so silly ? What's silly about that? If they'd store it in precomposed form (NFC) instead, you still can't use a simple string compare unless you normalize all strings. And even better, some characters used in some languages simply have no precomposed form at all, which means you'll always have to be prepared to handle characters composed of several Unicode code points. Michael Lutz -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/21/2011 04:13 PM, Michael Lutz wrote: If you write a file with a file name in unicode NFC in OS X and read the file name back from the OS, you'll get a NFD string returned, which means a normalization-unaware compare function will not do what you'd expect. ...as already mentioned in another message in this thread OSX does really silly stuff regarding file names. How does their object pascal handle this ? -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Michael Schnell mschn...@lumino.de hat am 21. Oktober 2011 um 17:23 geschrieben: On 10/21/2011 04:13 PM, Michael Lutz wrote: If you write a file with a file name in unicode NFC in OS X and read the file name back from the OS, you'll get a NFD string returned, which means a normalization-unaware compare function will not do what you'd expect. ...as already mentioned in another message in this thread OSX does really silly stuff regarding file names. The normalization is not silly. The really silly thing is that by default their file system is case insensitive, while many command line tools are not. How does their object pascal handle this ? I don't know what they did, but I know from Lazarus:It's not a big deal. Lazarus works since years on OS X and only needed a function to compare file names, which calls the OS X function. Of course Lazarus already supported Linux. It can be hard to port a Windows application to OS X. Mattias-- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Graeme Geldenhuys schrieb: On 2011-10-21 10:31, Michael Schnell wrote: So with these projects you obviously are a Unicode aware programmer and don't qualify for the group of office programmers that (IMHO) I don't have to think about anything special when working with Unicode text. I simply use the string manipulation function as defined in the fpGUI framework, and not the ones defined in FPC's RTL. The fpGUI framework handles the rest for me. Did you ever have a look at Delphi StrUtils.pas? Why reinvent the wheel, when functions already exist for some purpose? The Ansi prefix can be removed in an environment where the encoding of the string arguments is known (e.g. fixed to UTF-8). The bad news about new Delphi strings is the addition (overload) of functions with RawByteString arguments, which *take* strings of any encoding, but then *ignore* that encoding. These versions certainly must fail for all MBCS encodings :-( DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Graeme Geldenhuys schrieb: On 2011-10-21 10:19, Hans-Peter Diettrich wrote: Please specify Finding, a code snippet would be nice. Knock yourself out... https://github.com/graemeg/fpGUI/blob/master/src/corelib/fpg_stringutils.pas Take a look at UTF8Copy() or UTF8Insert() etc. I didn't mean the implementation, but the *task* to perform in application code. in FPC, until now. Give an example of UTF-8 code, which would become *more* complicated with UTF-16. Consider a Copy() type function where you want to copy a Unicode codepoint (think single character as you see on the screen - ignoring combining diacritics for now) out from a string. Again, *why* would you ever want to do that? It sounds to me like extracting bits from floating point values :-( UTF8Copy() as defined above will do that correctly, irrespective if the codepoint is in the BMP or Supplementary Plane or if the character is represented by 1,2,3 or 4 bytes in length. Why restrict such a function to UTF-8? For working with *logical* characters a set of functions is needed, that do not rely on character indices. A StartIndex parameter IMO indicates bad design :-( The functions can be easily overloaded to work with AnsiChar and WideChar string arguments, or even UCS4Char, if you like. With UTF-16 you need to check if the UTF-16 string is Little Indian or Big Indian (UTF-16BE or UTF-16LE), This has to be done only on input from an file, where the encoding should be converted into the internal representation for every external encoding. BTW, its Endian, not Indian nor Chinese ;-) whether the codepoint has a surrogate pair or not. All in all, a lot more complex than UTF-8. Sorry, UTF-8 and UTF-16 only provide different encodings for the same Unicode codepoints. Mixing Char and Codepoint indices and counts never is a good idea. With that in mind it's no problem to perform the same task on any encoding. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Michael Schnell schrieb: On 10/20/2011 09:42 PM, Felipe Monteiro de Carvalho wrote: Changing the size of Char is not just small detail, this breaks *a lot* of code. Any kind of memory operations such as Move will fail because the char size changed. Of course you are right, but move and friends is hardware-near programming for this who know what they are doing. but basic (legacy) string operations like myChar := myString[i] is office-level programming and thus should work as a dummy expects. Simple solution: use UTF-32 encoding :-) It's only a matter of optimization: save memory with more compressed encodings, or save coding time? DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Michael Lutz schrieb: Am 21.10.2011 15:02 schrieb Michael Schnell: But if you just deal with the user's GUI input and output and with files that you wrote yourself in some default encoding code the language tools define, IMHO a decent language should do whatever possible to hide the complexity. You'd advocate for fpc/Lazarus to normalize all incoming and outgoing file names then? Please distinguish between file names and content. Filenames are subject to platform conventions, with e.g. case sensitivity and directory separators. File content and encoding instead is fully up to the creator. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Žilvinas Ledas schrieb: Hello, On 2011-10-21 10:43, Michael Schnell wrote: Of course you are right, but move and friends is hardware-near programming for this who know what they are doing. but basic (legacy) string operations like myChar := myString[i] is office-level programming and thus should work as a dummy expects. What if a file on the user computer has 4byte [visible] character as 8th character and you, for example want to get 8 character file name? In this case you split that 4 byte character and have garbage. Then you (or your boss) didn't understand the meaning of 4 characters. (Logical) characters are different from physical Chars, in every MBCS codepage. What it user inputs in your text field (or a command line parameter or anywhere else) a string containing 4 byte character and you split that string on that character? (For example when showing some kind of summary of his input.) Don't forget that user can input characters by copy-pasting them from the web, not only using his keyboard! See above. With proportial fonts, counting characters is a bad idea, instead the width of the displayed string (in pixels) should be used. Then you also can deal with languages and character sets, which use ligatures and the like. Even with monospaced fonts the characters (glyphs) can have a different width, in multiples of the basic width, e.g. for Chinese or other eastern character sets. So, if you want to write PROFESSIONAL software with any user input - you must handle 4 byte characters at every place you get user input. Counting characters then is a bad idea, see above. Otherwise you leave a chance to get and show to the user garbage. Is this really easier than using UTF8 everywhere? My personal experience: I am maintaining (as a hobby project) multi-language dictionary program (a screen-shoot: http://2.bp.blogspot.com/_3-IaodGIbVQ/TMHY-l9M4sI/Aak/AbtShWq0ZUQ/s1600/KZod_screen_win7.png Great :-) ) and it involves quite a bit of [multilingual] string manipulation and when I did migration from delphi to Lazarus I didn't know about requirement that all (GUI) strings must be UTF8 and I had no problems migrating! Yes, afterwards I tweaked some calls to RTL (mostly file handling) functions that expected to get ANSI encoding, but this is not a problem of UTF8, but or RTL being (mostly) ansi. From which Delphi version did you migrate? What encoding did you use in Delphi? DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Graeme Geldenhuys schrieb: On 2011-10-20 17:30, Luca Olivetti wrote: Additionally, 16 bits is enough to cover the BMP, Basic Multilingual Plane, which encompasses the majority of today's most widely used languages. Only when you get to more advanced codepoints in some of the far-eastern languages, or are needing to encode dead languages such as Egyption hieroglyphics do you need more than 16 bits. That is such a rubbish statement! More and more information is being added outside the Unicode's BMP. Emoticons, Science and Maths symbols, Map Symbols (often seen in GPS applications), Music notes etc etc. What do you have in mind, what your code would do with e.g. music notes? Would it ever try to convert these into upper case, or to substitute parts of such strings by text??? You can get rubbish more easily, by random substitution of English words by German or Chinese ones... Or by parsing C code with a Pascal parser... DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Graeme Geldenhuys schrieb: On 2011-10-21 10:22, Hans-Peter Diettrich wrote: Now also tell us how application code is affected by such astral codepoints, and how these are handled easier in UTF-8 than in UTF-16. As to not repeat myself, see one of my other replies. I didn't ask for a repetition of *what* you did, but for a serious reason *why* you want to do what. You don't need Unicode for writing bogus applications, it only helps to demonstrate how stupid some ideas are ;-) DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Wednesday 19 October 2011 22.05:09 Vincent Snijders wrote: 2011/10/19 Michael Van Canneyt mich...@freepascal.org: On Wed, 19 Oct 2011, Felipe Monteiro de Carvalho wrote: On 10/19/11, Vincent Snijders vincent.snijd...@gmail.com wrote: I guess Felipe gave up waiting on a Unicode RTL for the time being and goes for a full UTF8 pseudo RTL in LazUtils. But this does not mean that LazUtils would not be useful then. My proposals to add UTF-8 routines to the RTL and even FCL were rejected, Correction: Your proposals were not rejected. Thanks for the clarification. No decision as to which character sets will be used in the basic RTL has been taken. Any action you take now is therefor premature. So it was suggested you would wait till things settle down till and the final shape of things are more clear. That is why I said: gave up waiting I think it would have been better if Lazarus had made an RTL optimized for Lazarus long time ago. Now the FPC team destroyed a stable product in favor of Delphi string compatibility. The introduction of multi-encoding strings is not a really good idea IMHO but more a marketing gag. It seems that the Delphi architects are not absolutely happy with it either. Allan Bauer in: https://forums.codegear.com/message.jspa?messageID=400258#400258 We have way, way too many different string types. It's confusing. There are more interesting statements in that thread, ex.: https://forums.codegear.com/message.jspa?messageID=399964#399964 It is even possible that Delphi strings change again... Martin -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 09:18 AM, Martin Schreiber wrote: Now the FPC team destroyed a stable product in favor of Delphi string compatibility. I don't consider the current state of Lazarus stable, as IMHO the UTF8 in type ANSIString paradigm (seemingly forced by the underlying FPC version) is too special to stay. As I don't have a Delphi 2009, I have no idea if the new Delphi way to handle Unicode is desirable or even better at all. So this is not meant as a criticism whatsoever. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Thursday 20 October 2011 09.32:05 you wrote: On 10/20/2011 09:18 AM, Martin Schreiber wrote: Now the FPC team destroyed a stable product in favor of Delphi string compatibility. I don't consider the current state of Lazarus stable, I don't refer to Lazarus but to Free Pascal compiler, RTL and FCL. Martin -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 09:43 AM, Martin Schreiber wrote: I don't refer to Lazarus but to Free pascal compiler, RTL and FCL. I suppose that there will be some kind of compiler switch allowing for selecting whether or not to use the new string feature. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 09:43 AM, Martin Schreiber wrote: I don't refer to Lazarus but to Free pascal compiler, RTL and FCL. I suppose that there will be some kind of compiler switch allowing for selecting whether or not to use the new string feature. That does not change the possibility that FPC compiler, RTL and FCL will be less stable than now because of the greater complexity and because of the possible new bugs. Martin -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Martin Schreiber wrote: On 10/20/2011 09:43 AM, Martin Schreiber wrote: I don't refer to Lazarus but to Free pascal compiler, RTL and FCL. I suppose that there will be some kind of compiler switch allowing for selecting whether or not to use the new string feature. That does not change the possibility that FPC compiler, RTL and FCL will be less stable than now because of the greater complexity and because of the possible new bugs. Hmm, now is wrong, should be before the cpstrnew merge... Martin -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Hi, On 2011-10-20 00:03, Felipe Monteiro de Carvalho wrote: Hello, 2011/10/19 Žilvinas Ledaszilvinas.le...@dict.lt: I am native Lithuanian so I think can help at least providing info, but I must understand what is the problem first. I am mostly interested in LowerCase / UpperCase. Could you explain how it works in Lithuanian and provide test cases for it? Test cases should be in this format: AssertStringOperationUTF8LowerCase('Unicode 0460 UTF8LowerCase', '', 'ѠѡѢѣѤѥѦѧѨѩѪѫѬѭѮѯ', 'ѡѡѣѣѥѥѧѧѩѩѫѫѭѭѯѯ'); Even better if they are in patches to the file lazarus/tests/lazutils/testunicode.pas First param is the label, the second the locale (in this case maybe something like 'lt', what is the ISO identifier for lituanian? Then UpperCase and then LowerCase. And try to make some tricky tests, to defeat partial implementations. 3 tests for lowercase and 3 for uppercase should be enough This should be easy. This week I have a lot of things to do but I'll try to look into this next week! Regards, Žilvinas Ledas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Michael Schnell schrieb: On 10/20/2011 09:18 AM, Martin Schreiber wrote: Now the FPC team destroyed a stable product in favor of Delphi string compatibility. I don't consider the current state of Lazarus stable, as IMHO the UTF8 in type ANSIString paradigm (seemingly forced by the underlying FPC version) is too special to stay. It's sufficient to agree that all (displayed) strings in the LCL contain UTF-8 text, regardless of their type name (string types currently are alias). As I don't have a Delphi 2009, I have no idea if the new Delphi way to handle Unicode is desirable or even better at all. So this is not meant as a criticism whatsoever. Delphi allows for an UTF8String type, but this one is (or has to be) converted into UnicodeString all the time. The Delphi RTL only supports UnicodeString (UTF-16) and native AnsiStrings (of CP_ACP), all other encodings are not really supported, except for UTF-16 conversion. MBCS are supported only as far eastern DBCS, not for UTF-8 (I wonder what a Linux version will bring). Functions with more than one (Ansi)String argument deserve special care, the *user* is responsible to only supply strings of the same encoding, or has to force the use of the UnicodeString versions by e.g. typecasts. I.e. it's highly discouraged to use any but CP_ACP or UTF-16 strings, except for corner cases (file I/O...). When FPC follows the new Delphi model, the LCL has to be ported to all strings containing UTF-16 - everything else will not work properly or causes many implicit conversions. This may require some work, and results in two incompatible versions (legacy Ansi/UTF-8 and new Unicode/UTF-16), and will not please all Linux (POSIX) users. IMO Linux is not a problem with the LCL, since the currently required UTF-8/16 conversions with external function calls are neglectable (on my Windows system). DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 01:55 PM, Hans-Peter Diettrich wrote: It's sufficient to agree that all (displayed) strings in the LCL contain UTF-8 text, regardless of their type name (string types currently are alias). And thus functions like pos(), length() and myString[i] work on UTF-8 code bytes rather than on (displayed) characters. Agreeable for thought who know just use ASCII (no Germans, ...) and though who have a decent knowledge on Unicode. All others are fooled. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 01:55 PM, Hans-Peter Diettrich wrote: Functions with more than one (Ansi)String argument deserve special care, the *user* is responsible to only supply strings of the same encoding, Very funny ! They invent a dynamically encoded string type that has the power to trigger conversions when necessary and abuse it. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 01:55 PM, Hans-Peter Diettrich wrote: It's sufficient to agree that all (displayed) strings in the LCL contain UTF-8 text, regardless of their type name (string types currently are alias). Plus: a char is not a displayable character but only can hold an UTF-8 code byte. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Thu, Oct 20, 2011 at 2:54 PM, Michael Schnell mschn...@lumino.de wrote: All others are fooled. No, they aren't. Please stop repeating this. I think you have already sent 10 messages in various threads saying that people are naturally unable to use UTF-8. This is not true. My students from the 2nd year of engineering learned alone how to use UTF-8 properly. People asking questions on the forum learned how to use it. I so far I have seen no single person which has such a problem with UTF-8 that it is mentally blocked from learning it, even while this same person can use UCS-2 fluently. This is all just fiction. Real experience shows that people can easily learn to use UTF-8. -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 02:55 PM, Felipe Monteiro de Carvalho wrote: On Thu, Oct 20, 2011 at 2:54 PM, Michael Schnellmschn...@lumino.de wrote: All others are fooled. This is not true. My students from the 2nd year of engineering learned alone how to use UTF-8 properly. That is exactly what I meant to say. Those who do learn how to deal with Unicode might be very happy to keep in mind the Unicode encoding with all string operations. And if your opinion is that everybody, who wants to program with Lazarus, is happy when he also learns the ways of Unicode, I will not contradict. But IMHO Lazarus should be (at least) as easy to use as Java and friends and not provide additional traps for the Unicode-illiterates. (I once proposed to drop the support for myString[i] or for the char type altogether to prevent some of these traps, but supposedly this is a silly idea.) -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Thu, Oct 20, 2011 at 3:10 PM, Michael Schnell mschn...@lumino.de wrote: But IMHO Lazarus should be (at least) as easy to use as Java and friends and not provide additional traps for the Unicode-illiterates. This argumentation is ridiculous, string usage in Java has tons of traps. To start with it is a special class type with special handling and immutable o.O Then you have no var parameters to pass them around (and even if you had, they are immutable)...etc But let's go back to real arguments: Do you have any big applications written in Lazarus? If you had, how happy would you be having to convert them from utf-8 to utf-16? -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/20/2011 03:45 PM, Felipe Monteiro de Carvalho wrote: Do you have any big applications written in Lazarus? If you had, how happy would you be having to convert them from utf-8 to utf-16? I never voted for a UTF-16 - Lazarus. In fact I have lots of applications done in Delphi 2009 and thus not Unicode aware at all but using just 8 bit fixed locale ANSI encoded strings. I was happily converting some of those to the pre-Unicode of Lazarus (for having them run on Linux). I was not at all happy trying to convert them to the always UTF-8 version of Lazarus. I am sure that I will not be more happy trying to convert them to a hypothetical always UTF-16 version of Lazarus. I have no idea how happy I would be trying to convert them to a Unicode aware Delphi version 2009. And of course I have no idea at all how happy I would be trying to convert them to an upcoming new Delphi String version of Lazarus. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Thu, Oct 20, 2011 at 4:19 PM, Michael Schnell mschn...@lumino.de wrote: I was not at all happy trying to convert them to the always UTF-8 version of Lazarus. Well, here is the surprise now: utf-8 was chosen exactly to facilitate porting old applications while still supporting all of the Unicode standard. Length does not give the number of chars? No problem: uses lazutf8; function UTF8Length(const s: string): PtrInt; How to iterate through chars? Using this: function UTF8CharacterLength(p: PChar): integer; How to find the n-th char? // find the n-th UTF8 character, ignoring BIDI function UTF8CharStart(UTF8Str: PChar; Len, CharIndex: PtrInt): PChar; // find the byte index of the n-th UTF8 character, ignoring BIDI (byte len of substr) function UTF8CharToByteIndex(UTF8Str: PChar; Len, CharIndex: PtrInt): PtrInt; Having problems in Pos, Copy, Delete or Insert? Just replace with these: function UTF8Pos(const SearchForText, SearchInText: string): PtrInt; function UTF8Copy(const s: string; StartCharIndex, CharCount: PtrInt): string; procedure UTF8Delete(var s: String; StartCharIndex, CharCount: PtrInt); procedure UTF8Insert(const source: String; var s: string; StartCharIndex: PtrInt); The switch is really easy. There are routines which are equivalent to all operations done previously. -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Al 20/10/2011 16:34, En/na Felipe Monteiro de Carvalho ha escrit: The switch is really easy. There are routines which are equivalent to all operations done previously. But you have to manually (or semi-automatically) do it, which is a lot of work and possibly error prone. While, with utf-16, you shouldn't change any routine name at all, unless you have to deal with characters outside the BMP. According to this message https://forums.codegear.com/message.jspa?messageID=399964#399964 Additionally, 16 bits is enough to cover the BMP, Basic Multilingual Plane, which encompasses the majority of today's most widely used languages. Only when you get to more advanced codepoints in some of the far-eastern languages, or are needing to encode dead languages such as Egyption hieroglyphics do you need more than 16 bits. Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es Tel. +34 935883004 (Ext.133) Fax +34 935883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Michael Schnell mschn...@lumino.de hat am 20. Oktober 2011 um 15:10 geschrieben: On 10/20/2011 02:55 PM, Felipe Monteiro de Carvalho wrote: On Thu, Oct 20, 2011 at 2:54 PM, Michael Schnellmschn...@lumino.de wrote: All others are fooled. This is not true. My students from the 2nd year of engineering learned alone how to use UTF-8 properly. That is exactly what I meant to say. Those who do learn how to deal with Unicode might be very happy to keep in mind the Unicode encoding with all string operations. And if your opinion is that everybody, who wants to program with Lazarus, is happy when he also learns the ways of Unicode, I will not contradict. But IMHO Lazarus should be (at least) as easy to use as Java and friends and not provide additional traps for the Unicode-illiterates. What Java do you have in mind? Last time I used Oracle/Sun Java it still used 2byte char and you need to set the compiler/IDE to UTF8, otherwise your source code is not portable. We have a lot of students writing java programs under Windows, then wondering why their programs create garbage under Linux. Often they say: Linux has problems with unicode. Reason: teachers think that unicode is so simple under java, so they don't explain it. (I once proposed to drop the support for myString[i] or for the char type altogether to prevent some of these traps, but supposedly this is a silly idea.) If you have students that stupid, then don't tell them about the [] operator. Mattias-- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Thu, Oct 20, 2011 at 5:30 PM, Luca Olivetti l...@wetron.es wrote: But you have to manually (or semi-automatically) do it, which is a lot of work and possibly error prone. While, with utf-16, you shouldn't change any routine name at all, unless you have to deal with characters outside the BMP. You say so, but while I cannot comment with certainty since I have never used the Unicode Delphi, from what I read people had major difficulties doing the migration. It was not at all easy. While for me the migration from ansi to utf-8 was trivially easy. Changing the size of Char is not just small detail, this breaks *a lot* of code. Any kind of memory operations such as Move will fail because the char size changed. Not to mention people that were using PChar to address memory which is not really a string =D suddenly the steps duplicate in size and your whole memory layout changes... -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Thu, Oct 20, 2011 at 2:54 PM, Michael Schnell mschn...@lumino.de wrote: And thus functions like pos(), length() and myString[i] work on UTF-8 code bytes rather than on (displayed) characters. Characters can be composed by separate codepoints for accent + character (so at least 4 bytes in UTF-16). So if you write code which depends on [] indexing characters your code will fail miserably in this case. Mac OS X uses the decomposed form in UTF-8 to store filenames, which is rather unpleasant. If you convert this to UTF-16 for further work the text will not magically get composed, although one could pass it through a composing pre-processor. -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Felipe Monteiro de Carvalho schrieb: On Thu, Oct 20, 2011 at 5:30 PM, Luca Olivetti l...@wetron.es wrote: But you have to manually (or semi-automatically) do it, which is a lot of work and possibly error prone. While, with utf-16, you shouldn't change any routine name at all, unless you have to deal with characters outside the BMP. You say so, but while I cannot comment with certainty since I have never used the Unicode Delphi, from what I read people had major difficulties doing the migration. It was not at all easy. While for me the migration from ansi to utf-8 was trivially easy. The Ansi/UTF-16 migration is much easier than a migration to UTF-8. When your legacy code can assume that every (visible) character is a Char, in an SBCS codepage, this is not different in UTF-16. But the same is not true for Ansi SBCS codepages whose characters can translate into multi-byte sequences in UTF-8. Changing the size of Char is not just small detail, this breaks *a lot* of code. Any kind of memory operations such as Move will fail because the char size changed. Why would *application* code ever do low-level fiddling with *managed* strings??? Not to mention people that were using PChar to address memory which is not really a string =D suddenly the steps duplicate in size and your whole memory layout changes... Then replace all occurences of String by AnsiString, and Char by AnsiChar (global findreplace). And replace all (eventual) usages of UTF8String by AnsiString, to prevent possible encoding conversions. Then all your code should work as before. Problems may arise from standard text components (TStrings...), when these are not also available in Ansi versions - but this only affects the runtime, due to implicit conversions. This is where the RTL and FCL deserve some more considerations, and the future will tell... DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Mon, Oct 3, 2011 at 4:35 PM, Henry Vermaak henry.verm...@gmail.com wrote: That's good news, thanks! Hello, Could you test the very latest Pascal Widestring Manager? Just disable cwstring and then add paswstring as the first unit in your projects uses clause. The Pascal Widestring Manager is completed, but it needs more testing =) -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Mon, Oct 03, 2011 at 04:31:20PM +0200, Felipe Monteiro de Carvalho wrote: Ok, I changed the define in rev 32655. But you should note that when paswstring gets finished it will phase out cwstrings. Not that I know. And btw, I also use arm-linux without android, so please keep that target intact and aligned with normal linux ports. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Wed, Oct 19, 2011 at 12:06 PM, Marco van de Voort mar...@stack.nl wrote: Not that I know. And btw, I also use arm-linux without android, so please keep that target intact and aligned with normal linux ports. What is the difference between using cwstring and paswstring? Any reason for not wanting to use paswstring? They should be 100% equal, except that one does not require any external libraries. If you can test and check if there are any differences of course would be excelent =) -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Wednesday 19 October 2011 13.14:50 Felipe Monteiro de Carvalho wrote: On Wed, Oct 19, 2011 at 12:06 PM, Marco van de Voort mar...@stack.nl wrote: Not that I know. And btw, I also use arm-linux without android, so please keep that target intact and aligned with normal linux ports. What is the difference between using cwstring and paswstring? Any reason for not wanting to use paswstring? Where is paswstring? Martin -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Marco van de Voort schrieb: On Mon, Oct 03, 2011 at 04:31:20PM +0200, Felipe Monteiro de Carvalho wrote: Ok, I changed the define in rev 32655. But you should note that when paswstring gets finished it will phase out cwstrings. Not that I know. And btw, I also use arm-linux without android, so please keep that target intact and aligned with normal linux ports. After some discussions in Embarcadero groups I would like to learn more about the FPC implementation and goals of the new (Unicode...) strings. Where should I have a look? In detail it turned out that Delphi only supports CP_ACP strings for Ansi codepages, not including UTF-8. Strings with other encodings may be converted properly (not yet), but otherwise should not be used with standard stringhandling procedures. Will this be changed in the FPC RTL, so that at least UTF8Strings are also supported properly? DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Wed, Oct 19, 2011 at 1:24 PM, Martin Schreiber mse00...@gmail.com wrote: Where is paswstring? http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/components/lazutils/paswstring.pas?view=markuproot=lazarus It uses lazutf8 (which includes most importantly UTF16ToUTF8 and viceversa and utf8LowerCase and utf8UpperCase) and lconvencoding (which includes encoding tables) which are in the same folder. -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Am 19.10.2011 14:08, schrieb Hans-Peter Diettrich: Marco van de Voort schrieb: On Mon, Oct 03, 2011 at 04:31:20PM +0200, Felipe Monteiro de Carvalho wrote: Ok, I changed the define in rev 32655. But you should note that when paswstring gets finished it will phase out cwstrings. Not that I know. And btw, I also use arm-linux without android, so please keep that target intact and aligned with normal linux ports. After some discussions in Embarcadero groups I would like to learn more about the FPC implementation and goals of the new (Unicode...) strings. Where should I have a look? In detail it turned out that Delphi only supports CP_ACP strings for Ansi codepages, not including UTF-8. Strings with other encodings may be converted properly (not yet), but otherwise should not be used with standard stringhandling procedures. Will this be changed in the FPC RTL, so that at least UTF8Strings are also supported properly? Uhm... isn't this better suited in fpc-devel? Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Wed, Oct 19, 2011 at 01:14:50PM +0200, Felipe Monteiro de Carvalho wrote: On Wed, Oct 19, 2011 at 12:06 PM, Marco van de Voort mar...@stack.nl wrote: Not that I know. And btw, I also use arm-linux without android, so please keep that target intact and aligned with normal linux ports. What is the difference between using cwstring and paswstring? Any reason for not wanting to use paswstring? Simply integrating with the OS, and avoid inclusion of tables when not necessary. Moreover you are stating something as a fact here that was not discussed at all. They should be 100% equal, except that one does not require any external libraries. If you can test and check if there are any differences of course would be excelent =) I haven't been testing it, and don't plan to. I'm not interested in it, and am not interested in growing the binaries unnecessarily. I have no problem with having a second option for the people that do want it, but that is something entirely different from what you were saying. Cwstring is staying on all normal targets as far as I know. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Wed, Oct 19, 2011 at 6:47 PM, Marco van de Voort mar...@stack.nl wrote: Moreover you are stating something as a fact here that was not discussed at all. I am confused by your statements, the discussion here is about the usage of cwstring in the LCL, then I said that I want to replace cwstring with paswstring in the LCL (after making sure it is completely equivalent). Are you also discussing about the usage of cwstring in the LCL? Your comments make me think that you are assuming I am talking about the RTL or something like that. -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Wednesday 19 October 2011 18.59:06 Felipe Monteiro de Carvalho wrote: On Wed, Oct 19, 2011 at 6:47 PM, Marco van de Voort mar...@stack.nl wrote: Moreover you are stating something as a fact here that was not discussed at all. I am confused by your statements, the discussion here is about the usage of cwstring in the LCL, then I said that I want to replace cwstring with paswstring in the LCL (after making sure it is completely equivalent). Are you also discussing about the usage of cwstring in the LCL? Your comments make me think that you are assuming I am talking about the RTL or something like that. Ah, sorry, I read it wrong too... Martin -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Wed, Oct 19, 2011 at 06:59:06PM +0200, Felipe Monteiro de Carvalho wrote: I am confused by your statements, the discussion here is about the usage of cwstring in the LCL, then I said that I want to replace cwstring with paswstring in the LCL (after making sure it is completely equivalent). Are you also discussing about the usage of cwstring in the LCL? Your comments make me think that you are assuming I am talking about the RTL or something like that. No, sorry. Though I still think that is not a good thing either. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
2011/10/19 Marco van de Voort mar...@stack.nl: On Wed, Oct 19, 2011 at 06:59:06PM +0200, Felipe Monteiro de Carvalho wrote: I am confused by your statements, the discussion here is about the usage of cwstring in the LCL, then I said that I want to replace cwstring with paswstring in the LCL (after making sure it is completely equivalent). Are you also discussing about the usage of cwstring in the LCL? Your comments make me think that you are assuming I am talking about the RTL or something like that. No, sorry. Though I still think that is not a good thing either. I guess Felipe gave up waiting on a Unicode RTL for the time being and goes for a full UTF8 pseudo RTL in LazUtils. Vincent -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Wed, Oct 19, 2011 at 6:33 PM, Martin Schreiber mse00...@gmail.com wrote: Does it use locale specific collation in PasUnicodeCompareStr and PasUnicodeCompareText? Good point, no, not yet. But this affects only turkish, azeri and lithuanian AFAIK Adding turkish and azeri is trivial, because UTF8LowerCase supports them, but I did not understand yet the rules for Lithuanian, they are quite convoluted, depend on nearby chars and stuff like that. Is the performance of UTF8LowerCase and UTF8UpperCase OK? UTF8LowerCase was heavily optimized. UTF8UpperCase still needs to be more optimized. 6 million UTF8LowerCase operations in the string АБВЕЁЖЗКЛМНОПРДЙГ takes 2,6 seconds in my computer. It outperforms iconv by a factor of 2,5x aprox: UTF8LowerCase-- Performance test took: 804 ms 1896 ms 2318 ms 3460 ms 2647 ms 1847 ms 2526 ms 2496 ms 1830 ms 1975 ms CWString SysUtils.UnicodeLowerCase-- Performance test took: 2456 ms 2461 ms 6594 ms 6170 ms 5347 ms 6939 ms 4398 ms 4429 ms 2285 ms 2411 ms For this strings: if j = 0 then Str := UTF8LowerCase('abcdefghijklmnopqrstuwvxyz'); if j = 1 then Str := UTF8LowerCase('ABCDEFGHIJKLMNOPQRSTUWVXYZ'); if j = 2 then Str := UTF8LowerCase('aąbcćdeęfghijklłmnńoóprsśtuwyzźż'); if j = 3 then Str := UTF8LowerCase('AĄBCĆDEĘFGHIJKLŁMNŃOÓPRSŚTUWYZŹŻ'); if j = 4 then Str := UTF8LowerCase('АБВЕЁЖЗКЛМНОПРДЙГ'); if j = 5 then Str := UTF8LowerCase('名字叫嘉英,嘉陵江的嘉,英國的英'); if j = 6 then Str := UTF8LowerCase('AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuWvVwXxYyZz'); if j = 7 then Str := UTF8LowerCase('AAaaBBbbCCccDDddEEeeFFffGGggHHhhIIiiJJjjKKkkLLllMMmm'); if j = 8 then Str := UTF8LowerCase('abcDefgHijkLmnoPqrsTuwvXyz'); if j = 9 then Str := UTF8LowerCase('ABCdEFGhIJKlMNOpQRStUWVxYZ'); Do UTF8LowerCase and UTF8UpperCase cover all upper/lowercase Unicode (possibly accented) characters? UTF8LowerCase currently covers all characters in the latest Unicode spec AFAIK. Of course I might have forgotten something, but I have tests for chars from to 0580 and more tests for other clusters. UTF8UpperCase is currently implemented from to 0450, but I will add the rest. Does it handle decomposed characters (cwstring doesn't)? I think that decomposed characters should work naturally. See, for example, if we have: [0]=~ (tilde accent, but the special version for composition) [1]=A which forms à and then we pass lowercase into it, we would get [0] without change and [1]=a which forms ã. Or am I wrong? If you are talking about handling for CompareText, then the answer would be that AFAIK it would be too inneficient to handle that in CompareText ... so we would need another routine for that NormalizedCompareText or something like that, which executes normalization, then lowercase and finally the comparison. -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 10/19/11, Vincent Snijders vincent.snijd...@gmail.com wrote: I guess Felipe gave up waiting on a Unicode RTL for the time being and goes for a full UTF8 pseudo RTL in LazUtils. Well, after a lot of discussion I got convinced that Lazarus should give a try at the UTF-8 mode of the RTL when this appears, and this might be very useful for our usage of TStringList, TComponent, TStream, etc. I think this solution has major problems, but it was claimed that my proposed solutions have much worse problems, so in the end I concluded that we should try the UTF-8 mode of the RTL when it appears. But this does not mean that LazUtils would not be useful then. My proposals to add UTF-8 routines to the RTL and even FCL were rejected, so we UTF-8 users would need to be stuck with only whatever routines Embarcadero invents. That's not nearly good enough and not nearly fast enough. UTF8LowerCase is very superior to the existing RTL LowerCase. To start with, the RTL in existing release doesn't even have a UTF8String LowerCase (no idea about 2.7). Also, UTF8LowerCase has a second parameter to specify the language, so we can test and support Turkish without having to change our locale to turkish, and it outperforms SysUtils.UnicodeLowerCase by 250% aprox in my Mac, and it has zero external dependencies while depending on zero initialization code, zero global variables and having 1k lines of code (half of them comments), which is not that much. As you can see it vastly outperforms even what the UTF-8 mode of the RTL would offer for this. Just like UTF8LowerCase, other things provided by LazUtils will also be useful options for Lazarus and other libraries/applications, regardless of FPC offering something similar. And then I think that everyone will be happy. People that want Delphi compatibility (excluding string and PChar, since they will not match in the RTL mode used by Lazarus) will be happy, they can use RTL routines and get compatibility. Lazarus will still be using string and TStringList, TComponent, etc. -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Wed, 19 Oct 2011, Felipe Monteiro de Carvalho wrote: On 10/19/11, Vincent Snijders vincent.snijd...@gmail.com wrote: I guess Felipe gave up waiting on a Unicode RTL for the time being and goes for a full UTF8 pseudo RTL in LazUtils. Well, after a lot of discussion I got convinced that Lazarus should give a try at the UTF-8 mode of the RTL when this appears, and this might be very useful for our usage of TStringList, TComponent, TStream, etc. I think this solution has major problems, but it was claimed that my proposed solutions have much worse problems, so in the end I concluded that we should try the UTF-8 mode of the RTL when it appears. But this does not mean that LazUtils would not be useful then. My proposals to add UTF-8 routines to the RTL and even FCL were rejected, Correction: Your proposals were not rejected. No decision as to which character sets will be used in the basic RTL has been taken. Any action you take now is therefor premature. So it was suggested you would wait till things settle down till and the final shape of things are more clear. This really is not the same as 'rejected'. Michael. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
2011/10/19 Michael Van Canneyt mich...@freepascal.org: On Wed, 19 Oct 2011, Felipe Monteiro de Carvalho wrote: On 10/19/11, Vincent Snijders vincent.snijd...@gmail.com wrote: I guess Felipe gave up waiting on a Unicode RTL for the time being and goes for a full UTF8 pseudo RTL in LazUtils. But this does not mean that LazUtils would not be useful then. My proposals to add UTF-8 routines to the RTL and even FCL were rejected, Correction: Your proposals were not rejected. Thanks for the clarification. No decision as to which character sets will be used in the basic RTL has been taken. Any action you take now is therefor premature. So it was suggested you would wait till things settle down till and the final shape of things are more clear. That is why I said: gave up waiting This really is not the same as 'rejected'. Vincent -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Hello, On 2011-10-19 21:03, Felipe Monteiro de Carvalho wrote: On Wed, Oct 19, 2011 at 6:33 PM, Martin Schreibermse00...@gmail.com wrote: Does it use locale specific collation in PasUnicodeCompareStr and PasUnicodeCompareText? Good point, no, not yet. But this affects only turkish, azeri and lithuanian AFAIK Adding turkish and azeri is trivial, because UTF8LowerCase supports them, but I did not understand yet the rules for Lithuanian, they are quite convoluted, depend on nearby chars and stuff like that. I am native Lithuanian so I think can help at least providing info, but I must understand what is the problem first. Do I understand correctly, that collation means sorting order? In that case Lithuanian does not depend on near by characters. There are 32 letters and they follow this order: Aa Ąą Bb Cc Čč Dd Ee Ęę Ėė Ff Gg Hh Ii Įį Yy Jj Kk Ll Mm Nn Oo Pp Rr Ss Šš Tt Uu Ųų Ūū Vv Zz Žž And there are some accented characters which are used only in linguistic texts (for example, dictionaries). (All list is here: http://developer.mimer.com/charts/lithuanian.htm) The funny thing is that in dictionaries when sorting words, Aa and Ąą (also: Ee and Ęę and Ėė; Ii and Įį and Yy; Uu and Ųų and Ūū) are treated as the same letter. BUT, for example words šieną sieną sieną - all three are different words (no accents in these characters). BUT I believe that accented characters should be treated as the same letter: šiẽną = šieną; siena = síena, because it is the same word (accents do not change word meaning and are totally not required to be provided by the text writer). I don't know if I managed to explain anything, but if you'll need some help with Lithuanian language - feel free to contact me. Regards, Žilvinas Ledas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Hello, 2011/10/19 Žilvinas Ledas zilvinas.le...@dict.lt: I am native Lithuanian so I think can help at least providing info, but I must understand what is the problem first. I am mostly interested in LowerCase / UpperCase. Could you explain how it works in Lithuanian and provide test cases for it? Test cases should be in this format: AssertStringOperationUTF8LowerCase('Unicode 0460 UTF8LowerCase', '', 'ѠѡѢѣѤѥѦѧѨѩѪѫѬѭѮѯ', 'ѡѡѣѣѥѥѧѧѩѩѫѫѭѭѯѯ'); Even better if they are in patches to the file lazarus/tests/lazutils/testunicode.pas First param is the label, the second the locale (in this case maybe something like 'lt', what is the ISO identifier for lituanian? Then UpperCase and then LowerCase. And try to make some tricky tests, to defeat partial implementations. 3 tests for lowercase and 3 for uppercase should be enough -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
[Lazarus] cwstring in arm-linux
Hi list In revision 31913, Felipe did this: r31913 | sekelsenmat | 2011-08-08 12:23:08 +0100 (Mon, 08 Aug 2011) | 1 line Fixes linking LCL-Android by disabling all links to libc in arm-linux --- lcl/include/lcl_defines.inc (revision 31912) +++ lcl/include/lcl_defines.inc (revision 31913) @@ -1,2 +1,8 @@ // Add defines here. This file should be included in all LCL units headers -{$define UseCLDefault} \ No newline at end of file +{$define UseCLDefault} + +// For Android and other ARM-devices, otherwise the LCL will dependent on libc +{$IFDEF ARM}{$IFDEF UNIX} + {$DEFINE DisableCWString} + {$DEFINE DisableIconv} +{$ENDIF}{$ENDIF} Could someone change this ifdef to ANDROID, or something other than ARM? Henry -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Mon, Oct 3, 2011 at 2:18 PM, Henry Vermaak henry.verm...@gmail.com wrote: Could someone change this ifdef to ANDROID, or something other than ARM? There is no such define in FPC, so either way is problematic. I'll check if I can finish paswstring instead. -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 03/10/11 13:52, Felipe Monteiro de Carvalho wrote: On Mon, Oct 3, 2011 at 2:18 PM, Henry Vermaakhenry.verm...@gmail.com wrote: Could someone change this ifdef to ANDROID, or something other than ARM? There is no such define in FPC, so either way is problematic. Yes, I know, but you can define ANDROID when you build lazarus from the command line. All my arm-linux systems have libc. I run a normal linux distro on my arm netbook, so the only alternative for me would be to patch that file. Henry -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
Ok, I changed the define in rev 32655. But you should note that when paswstring gets finished it will phase out cwstrings. -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On 03/10/11 15:31, Felipe Monteiro de Carvalho wrote: Ok, I changed the define in rev 32655. But you should note that when paswstring gets finished it will phase out cwstrings. That's good news, thanks! Henry -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] cwstring in arm-linux
On Monday 03 of October 2011 16:31:20 Felipe Monteiro de Carvalho wrote: Ok, I changed the define in rev 32655. But you should note that when paswstring gets finished it will phase out cwstrings. Only if it's 1/1 with cwstrings please :) zeljko -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus