Re: [fpc-devel] assign constant text to widestring
UpperCase, LowerCase, CapitalCase, WordBreak, ParagraphBreak, ... almost all have some language exceptions. I don't doubt that you are right here, but I don't think that there is any support for this in the RTL. So it seems to be a lot less relevant than general Unicode handling. So I thing we first should have decent Unicode support (e.g. assigning string constants to WideStrings correctly independent from the code the source file is stored in maybe this is a Lazarus bug and FPC can't help it as it gets a called erroneously ) correct automatic conversions when assigning an UTFString to a WideString and vice-versa. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
On 24 Oct 2008, at 01:46, Felipe Monteiro de Carvalho wrote: I agree with Daniël on this one. Simplify. ë -- Ë always If you need something which takes into consideration the language then build another routine with more parameters. UpperCase and LowerCase are mapped to OS routines which do take into account the current locale (at least on *nix). So unless those routines always do ë - Ë regardless of the locale, you will not get this behaviour. Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Last time I cheked it on Windows no UpperCase is performed for WideString for codepoints 127, maybe it has been changed recently (1 month). With which software ? In my tests, If I create the WideString correctly (which is needed to be done with an explicit call to a conversion from utf8String), uppercase works for the German Umlauts äöü.. With Turbo Delphi it easily works, as WideStrings are correctly converted from ANSIStrings automatically. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
On 24 Oct 2008, at 13:59, Michael Schnell wrote: Last time I cheked it on Windows no UpperCase is performed for WideString for codepoints 127, maybe it has been changed recently (1 month). With which software ? In my tests, If I create the WideString correctly (which is needed to be done with an explicit call to a conversion from utf8String), uppercase works for the German Umlauts äöü.. With Turbo Delphi it easily works, as WideStrings are correctly converted from ANSIStrings automatically. They are in FPC as well, at least if your ansistrings contain ansi- encoded strings. I think you are mixing Lazarus and FPC in the above (because Lazarus puts utf-8 encoded stuff in ansistrings). Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
They are in FPC as well, at least if your ansistrings contain ansi-encoded strings. I think you are mixing Lazarus and FPC in the above (because Lazarus puts utf-8 encoded stuff in ansistrings). Sorry, I indeed forgot to write using Lazarus, here Unicode constants seem to be not WideStrings (as seemingly with pure FPC) but UFT8Strings and this causes the problem. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Daniël Mantione schrieb: The issue might be the UCS-2 encoding of your source, perhaps try to feed the compiler UTF-8, I didn't even know the compiler accepts UCS-2, it may not work correctly. The compiler definitively eats no ucs-2 encoded sources. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Michael Schnell schrieb: A decent system should be able to do the necessary conversions automatically: This is a simplified view which ignores the resource wasting of this apporoach not visible in the academical example below. The conversion utf-8-utf-16 is a very expensive operation and the compiler has to insert it all over the place and people would cry about the performance of their programs. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
The compiler definitively eats no ucs-2 encoded sources. I did check several times: My source file looks like this when I open it with Ultra-Edit and tell to show it in Hex: FF FE 75 00 6E 0069 00 74 00 20 00 55 00 6E 00 ..u.n.i.t. .U.n. Now I created a Delphi program and read the file with TFileStream. Now I found a utf-8 coded information without a BOM. So Windows seems to play some nasty tricks on us. Supposedly FPC reads the file in a similar way as TFileStream and thus the compiler in fact sees utf8. All this is really nasty stuff !!! -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Op Thu, 23 Oct 2008, schreef Michael Schnell: The compiler definitively eats no ucs-2 encoded sources. I did check several times: My source file looks like this when I open it with Ultra-Edit and tell to show it in Hex: FF FE 75 00 6E 0069 00 74 00 20 00 55 00 6E 00 ..u.n.i.t. .U.n. Now I created a Delphi program and read the file with TFileStream. Now I found a utf-8 coded information without a BOM. So Windows seems to play some nasty tricks on us. Supposedly FPC reads the file in a similar way as TFileStream and thus the compiler in fact sees utf8. The compiler uses blockread and performs no such conversions. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
As has been said before: the compiler itself simply does not support UCS-2. Regardless of any BOM, compiler setting or Lazarus setting, it will not understand it. See ,y other post in this thread: Windows XP seems to play some tricks on us here so that Ultraedit sees the UCS2 coded file while the compiler gets fed with utf8. Don't think that I understand why/how this happens. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
The conversion utf-8-utf-16 is a very expensive operation and the compiler has to insert it all over the place and people would cry about the performance of their programs. Of course I do agree. If you want to care about performance you need to know what to do: Either use WideString all over the place and beware of the LCL API, or use UTF8String all over the place. But if you use UTF8String you need to be aware that you can't do simple and totally normal things like s := copy(s, 3); to get the first three characters of a string. Really finding the first three characters of a string is an interesting and time consuming task with utf8 ;) . That is why I feel that it would be a lot better if the LCL would use a WideString API. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Michael Schnell schreef: The conversion utf-8-utf-16 is a very expensive operation and the compiler has to insert it all over the place and people would cry about the performance of their programs. Of course I do agree. If you want to care about performance you need to know what to do: Either use WideString all over the place and beware of the LCL API, or use UTF8String all over the place. But if you use UTF8String you need to be aware that you can't do simple and totally normal things like s := copy(s, 3); to get the first three characters of a string. Really finding the first three characters of a string is an interesting and time consuming task with utf8 ;) . That is why I feel that it would be a lot better if the LCL would use a WideString API. If you want widestring, then maybe mseide is a better option for you. Vincent ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Michael Schnell schrieb: The conversion utf-8-utf-16 is a very expensive operation and the compiler has to insert it all over the place and people would cry about the performance of their programs. Of course I do agree. If you want to care about performance you need to know what to do: Either use WideString all over the place and beware of the LCL API, or use UTF8String all over the place. But if you use UTF8String you need to be aware that you can't do simple and totally normal things like s := copy(s, 3); to get the first three characters of a string. Really finding the first three characters of a string is an interesting and time consuming task with utf8 ;) . This is also a simplified view. - firstly, which real world (!) task really requires to execute an operation like this, mostly it's something like copy(s,pos(...),...); - secondly, a properly coded utf-16 application shouldn't do this either: it doesn't handle surrogates properly and e.g. umlauts can be encoded in all utf flavours as two chars: base letter plus the umlaut (the two dots). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
utf-16 application shouldn't do this either: it doesn't handle surrogates properly Right you are. For me WideString is UCS2 and not UTF16, as I regard it as a sequence of WideChar so that the Unicode user code can be done using WideChar and WideString. WideChar only has 16 Bits. So this restrict us to Unicode Characters $. I doubt that I ever will need to use Unicode Characters $, but of course there _are_ other projects. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
In our previous episode, Florian Klaempfl said: But if you use UTF8String you need to be aware that you can't do simple and totally normal things like s := copy(s, 3); to get the first three characters of a string. Really finding the first three characters of a string is an interesting and time consuming task with utf8 ;) . This is also a simplified view. - firstly, which real world (!) task really requires to execute an operation like this, mostly it's something like copy(s,pos(...),...); - secondly, a properly coded utf-16 application shouldn't do this either: it doesn't handle surrogates properly and e.g. umlauts can be encoded in all utf flavours as two chars: base letter plus the umlaut (the two dots). More importantly, most of such routines will be implicitely tied to a certain language or language group already. The idea that UCS2 simply expands the character range, and the rest stays the same is naieve. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Ultraedit might fool you here. Id edits either ansi or usc2. If you have a utf8 encoded file, it will show the contents in hex as being ucs2 That might be. But it would even virtually insert a BOPM ?!?!?!? Why should it do this when using the hex editor ? -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
More importantly, most of such routines will be implicitely tied to a certain language or language group already. Which kind of UCS2 based function do you think are tied to a language(group) ? -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
On 23 Oct 2008, at 13:41, Michael Schnell wrote: utf-16 application shouldn't do this either: it doesn't handle surrogates properly Right you are. For me WideString is UCS2 and not UTF16, as I regard it as a sequence of WideChar so that the Unicode user code can be done using WideChar and WideString. WideChar only has 16 Bits. So this restrict us to Unicode Characters $. I doubt that I ever will need to use Unicode Characters $, but of course there _are_ other projects. I doubt that you will never need to support decomposed characters (such as ä being encoded as basically a¨). It's not that uncommon. Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Michael Schnell schrieb: More importantly, most of such routines will be implicitely tied to a certain language or language group already. Which kind of UCS2 based function do you think are tied to a language(group) ? Bidi stuff? You are aware of the fact that unicode strings can contain e.g. bidi markers? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
On Thursday 23 October 2008 13.31:30 Florian Klaempfl wrote: This is also a simplified view. - firstly, which real world (!) task really requires to execute an operation like this, mostly it's something like copy(s,pos(...),...); - secondly, a properly coded utf-16 application shouldn't do this either: it doesn't handle surrogates properly and e.g. umlauts can be encoded in all utf flavours as two chars: base letter plus the umlaut (the two dots). One should normalize unicode text before processing. If normalized to fully composed form there will be no problems with UCS2 single character processing in Western Europe. The GUI kit should return fully composed characters when ever possible to simplify the users life. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Michael Schnell wrote: Ultraedit might fool you here. Id edits either ansi or usc2. If you have a utf8 encoded file, it will show the contents in hex as being ucs2 That might be. But it would even virtually insert a BOPM ?!?!?!? Why should it do this when using the hex editor ? Since it converts the UTF8 file internally to UCS2 on read before editing. Marc ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Bidi stuff? You are aware of the fact that unicode strings can contain e.g. bidi markers? Sorry, never heard of bidi :( -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Michael Schnell schrieb: Bidi stuff? You are aware of the fact that unicode strings can contain e.g. bidi markers? Sorry, never heard of bidi :( http://www.unicode.org/reports/tr9/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
If you want widestring, then maybe mseide is a better option for you. Again I do know this, and I in fact don't have a project that needs Unicode. But the cause why I started this thread is to help making Lazarus / FPC even more useful. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Since it converts the UTF8 file internally to UCS2 on read before editing. Seems really silly to me. But the file length really indicated that it's utf8 coded and when looking at the file with WinCommander's hex viewer it's utf-8. So I suppose that you are right and the nasty trick is Ultraedit's. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
On Thursday 23 October 2008 13.58:04 Michael Schnell wrote: Bidi stuff? You are aware of the fact that unicode strings can contain e.g. bidi markers? Sorry, never heard of bidi :( Bidirectional text. Much more important than the hypothetical codepoints above the BMP. MSEgui does not support bidi BTW, too difficult. ;-) Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
I doubt that you will never need to support decomposed characters (such as ä being encoded as basically a¨). It's not that uncommon. This is the nasty old stuff Unicode should be useful to get rid of -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
Michael Schnell wrote: Since it converts the UTF8 file internally to UCS2 on read before editing. Seems really silly to me. No it's not. This way you have internally only to support 2 editors. One with bytechars and one with wordchars (ignoring surrogates and other stuff) But the file length really indicated that it's utf8 coded and when looking at the file with WinCommander's hex viewer it's utf-8. So I suppose that you are right and the nasty trick is Ultraedit's. Yes, since auto conversion by the OS i find very unlikely (yes I once tripped over this with ultraedit too) Marc ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
http://www.unicode.org/reports/tr9/ Thanks. I see. (In fact I even did do embedded software for a display that can show Hebrew text. But this was with ANSI code.) -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
DM Example: In Dutch uppercase characters generally do not get tremas: Daniël becomes DANIEL. Should an uppercase routine worry? No, this is a spelling convention, the correct uppercase of ë is Ë, we should not confuse spelling with uppercasing. No. This is not a spelling convention. It is a rule dictated by the language the word is written in. If the word Daniël is Dutch, then its uppercase is: UpperCase(Daniël, langDutch) -- DANIEL Fine. Yet, if we dont know what lang it is written, then the uppercase is: UpperCase(Daniël, langUndefined) -- DANIËL Now.. as I don't know Dutch at all, I wonder what the LowerCase transforms would be for the same uppercased word, DANIEL LowerCase(DANIEL, langDutch) -- daniel or, LowerCase(DANIEL, langDutch) -- daniël or both? If both, how do you pick the correct one? Example also, in spanish sólo is different than SOLO and meaning is different ( alone only ). Yes, it is impretative that we know the language of the word is in, so that UpperCase(sólo, langSpanish) -- SÓLO UpperCase(solo, langSpanish) -- SOLO Otherwise, we may end up altering the meaning of the text. UpperCase(), LowerCase() should not alter the meaning of the text. This would be a crime in any other context. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
I agree with Daniël on this one. Simplify. ë -- Ë always If you need something which takes into consideration the language then build another routine with more parameters. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
On 2008-10-24 02:46, Felipe Monteiro de Carvalho wrote: I agree with Daniël on this one. Simplify. ë -- Ë always If you need something which takes into consideration the language then build another routine with more parameters. It's not that simple. How would you uppercase this piece of string In Dutch uppercase characters generally do not get tremas: Daniël becomes DANIEL. correctly unless you knew the substring Daniël is in Dutch and while the rest is in English ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel