Re: [fpc-devel] Forwarded message about FPC status
On 12/24/2012 05:19 PM, Martin Schreiber wrote: - Compile at least as fast as Delphi 7 IMHO hard to do for a portable system and not very important regarding modern hardware. I only feel the linking stage is a viable goal here, as in most cases the by far most of the already compiled units need not to be recompiled when doing a make after editing some source code. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Forwarded message about FPC status
On 12/25/2012 02:22 PM, Florian Klaempfl wrote: What's the advantage in doing so? The code hangs around and does not hurt in any way. +1 -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] LLVM
On 12/26/2012 11:43 AM, Martin Schreiber wrote: Do you have experiences with LLVM? Does it actually create great code? Lets see what Embarcadero comes up with -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Feature announcement: Extension of TThread's interface
Great ! Thanks a lot. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)
On 01/05/2013 12:28 PM, Jonas Maebe wrote: Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8 encoding of that character. Sorry, I can't follow. Does #xx not just define a numerical representation of an 8 bit entity ? The interpretation in any code might be done later by any code that digests the string. Am I wrong ? -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)
On 01/05/2013 01:35 PM, Jy V wrote: I do vote for UTF-8 -1 Regarding that conversions in the RTL (or LCL) are a rather seldom runtime-task, GUI performance issues are not really necessary to be considered. Viable issues seem to be Delphi compatibility, backward compatibility, usability, runtime-performance with time consuming complex string tasks (these seem to vote against UTF8, but for either static UTF 16 or (quasi-) dynamical (CE-alike) encoding; and memory usage and runtime-performance with time consuming simple string tasks (which vote for locale-based ANSI or UTF-8). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)
Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell said: On 01/05/2013 12:28 PM, Jonas Maebe wrote: Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8 encoding of that character. Sorry, I can't follow. Does #xx not just define a numerical representation of an 8 bit entity ? The interpretation in any code might be done later by any code that digests the string. Am I wrong ? I *think* Jonas is trying to say that if you want the character `Ǿ` in a string you would either type - 'Ǿ' or - #$C7#$BE if you want to keep the source free of encoding specific characters You as a programmer make up what you do with it afterwards, if you decide to write it to an UTF-8 terminal, you would get `Ǿ`, and if you write it to some other terminal you might see a character that matches $C7, followed by a character that matches $BE in the lookuptable of the encoding of the terminal. Look at it this way: the byte sequence ($C7, $BE) has got no meaning to the compiler whatsoever, it is a byte sequence. That's what matters to the compiler, what is in this sequence is for you to decide. Correct me if I'm wrong. -- Ewald ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] LLVM
On 01/07/13 11:02, Michael Schnell wrote: Lets see what Embarcadero comes up with I wouldn't hold my breath. Based on recent Embarcadero history, the first version would be absolute crap, second version might be beta quality, 3rd version might not even exist (removed from product). Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)
On Mon, January 7, 2013 13:28, Ewald wrote: Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell said: On 01/05/2013 12:28 PM, Jonas Maebe wrote: Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8 encoding of that character. Sorry, I can't follow. Does #xx not just define a numerical representation of an 8 bit entity ? The interpretation in any code might be done later by any code that digests the string. Am I wrong ? I *think* Jonas is trying to say that if you want the character `Ǿ` in a string you would either type - 'Ǿ' or - #$C7#$BE if you want to keep the source free of encoding specific characters . . ...or - #$01FE and then the whole string becomes a Unicode string which is either kept that way (if it is assigned to a UnicodeString constant), or it is converted to some 8-bit encoding at compile time (if it is assigned to an 8-bit constant/variable like ansistring) (also just my understanding of what Jonas wrote) Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)
So the ambiguity with _filling_ a string with data in fact arises when _not_ using the #nn notation :-) . With #nn the effect (i.e. the resulting binary) is obvious. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)
On 01/07/2013 02:01 PM, Tomas Hajny wrote: (also just my understanding of what Jonas wrote) I feel you are wrong. The string does not know about the code it's content is to be interpreted in (other than with Delphi XE). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)
Once upon a time, on 01/07/2013 02:17 PM to be precise, Michael Schnell said: So the ambiguity with _filling_ a string with data in fact arises when _not_ using the #nn notation :-) . With #nn the effect (i.e. the resulting binary) is obvious. Well, if there is literally the sequence $C7, $BE in your source code (that is, open up a hex editor and actually see the values there, as one byte each) that would also do the same, as the compiler will default to one byte strings I think. The only issue with this is that you also need to set your code editor to the encoding you want 'cause otherwise it will screw up the display and possible binary value of the character. So, yes I would say the #nn notation is probably the safest to use, also handy if your character contains (or is) something that `cannot be there`, like a newline: #10 (or #13#10 under windows) Also, if you use a literal utf-16 char in the code (so no #, but the actual character) I think the {$codepage utf16} directive might come in handy, as otherwise the compiler will interpret this series of bytes as sperate single bytes characters. This is however not an issue with the # notation, as there is no ambiguity with this interpretation. -- Ewald ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 5 January 2013 13:39, Mattias Gaertner nc-gaert...@netcologne.de wrote: On Sat, 5 Jan 2013 13:06:42 + Frank Church vfcli...@gmail.com wrote: [...] It is obvious that Unicode is not a simple topic and among FPC/Lazarus developers/contributors,I suspect that few if any at all, have a detailed grasp of how it all hangs together in the current state of implementation. It brings to mind the parable of the 12 blind men and the elephant. The FPC and Lazarus UTF details are not that difficult. The complexity comes from adding Delphi *, third party libraries and old FPC, Lazarus versions. I think a diagram or graph of Unicode rules and their current state of implementation in FPC/Lazarus would go a long way to helping both developers and end users in this area. It is a topic which comes up regularly and it doesn't show signs of ever going to be properly resolved. For Lazarus: - works with fpc 2.6.x and 2.7.1 - LCL and most code expect ansistrings to hold UTF-8. - pascal sources, lfm, po files are stored in UTF-8 without BOM. Special care has to be taken, when using widestrings/unicodestring. - there are UTF-8 functions and classes (most in package lazutils). - the IDE supports many encodings - all this is documented via wiki and fpdoc - no support for UTF-16 has been started [...] Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel Glad to hear this. -- Frank Church === http://devblog.brahmancreations.com ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)
On Mon, January 7, 2013 14:19, Michael Schnell wrote: On 01/07/2013 02:01 PM, Tomas Hajny wrote: (also just my understanding of what Jonas wrote) I feel you are wrong. The string does not know about the code it's content is to be interpreted in (other than with Delphi XE). Sorry, your way of quoting makes it difficult for others to react. I freely admit that I may be wrong, but I don't understand what you meant with your comment and thus I don't understand in what way you I am wrong in your view. The compiler obviously knows how the constant is used within the source code and thus it may proceed accordingly (i.e. either convert it to some 8-bit encoding at compile time if UTF-16 code constant appears in the source, or keep it in UTF-16 if assigned to a UnicodeString constant). Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)
Once upon a time, on 01/07/2013 05:05 PM to be precise, Tomas Hajny said: On Mon, January 7, 2013 14:19, Michael Schnell wrote: On 01/07/2013 02:01 PM, Tomas Hajny wrote: (also just my understanding of what Jonas wrote) I feel you are wrong. The string does not know about the code it's content is to be interpreted in (other than with Delphi XE). Sorry, your way of quoting makes it difficult for others to react. I freely admit that I may be wrong, but I don't understand what you meant with your comment and thus I don't understand in what way you I am wrong in your view. The compiler obviously knows how the constant is used within the source code and thus it may proceed accordingly (i.e. either convert it to some 8-bit encoding at compile time if UTF-16 code constant appears in the source, or keep it in UTF-16 if assigned to a UnicodeString constant). Yep, the compiler does know how the constant is used and how it is defined (how else could it generate working code?), but I don't see how it could do something with it if it is assigned to another type of string (by type I mean `one-byte versus two-byte`). The compiler can't know for sure what you mean, it can do at least these things: - Copy data without translating, so a one char two-byte string becomes a two char one-byte string; a three char one-byte string would become a three char two byte string; and then there is a pardox: should a three-char two-byte string become a six-char one-byte string? == this is probably not how it is done - Translate the meanings of the characters of the string, but here the compiler needs to know in what encoding they are and in what encoding the string is wanted. (which it doesn't I believe; the $codepage directive is only used for the encoding of the characters in the unit intself) == I think this also isn't a a possibility - Copy the data byte per byte, but then a one-byte string containing an uneven amount of chars needs padding + there are issues with endianness here == Not really an option no? - Truncate every value of a two-byte string to convert it two a one byte string; the other way around would put each character of the one-byte string as one in the two-byte string == Solves the first paradox, but introduces loss of data == All the above options (except the translation, that is) ignore the escape charachter(s) of the string, so you wont get the data you want. IMO I don't think it (typecasting a one-byte string to a two-byte string) can be done without human intervention. Look at it this way: typecasting a thread handle to an integer makes no sense either: - They are both related (a thread handle is definitely a number, even if it is a pointer) - But putting one in the other makes no sense at all: what does `comparing whether a thread id is less than zero` mean? on the other hand `comparing whether an integer is less than zero` has a distinct meaning. - The sizes may be different (say an integer of 16 bit long and a thread handle of 64 bit long), how do you put one in the other? Sum the bytes together? Multiply them? Take the 16 bit CRC of the handle? This is IMO the same with a one-byte char and a two byte char: - They both represent letters/words/... - But they are not the same and cannot be typecasted without extra knowlegde. This last point is also valid for my example above: you could put all thread ids you know of in a lookup-table and put the index in that lookup-table in the 16-bit integer. Fixed. Same goes for our strings: if you know one is UTF-8 and you want to convert it to UTF-16 it can be done without error, but without this extra knowledge it can't give you decisive results. Just a few points I think bear some potential to contemplate over a cup of $c0ffee ;-) -- Ewald ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)
Tomas Hajny wrote: On Mon, January 7, 2013 13:28, Ewald wrote: Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell said: On 01/05/2013 12:28 PM, Jonas Maebe wrote: Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8 encoding of that character. Sorry, I can't follow. Does #xx not just define a numerical representation of an 8 bit entity ? The interpretation in any code might be done later by any code that digests the string. Am I wrong ? I *think* Jonas is trying to say that if you want the character `Ǿ` in a string you would either type - 'Ǿ' or - #$C7#$BE if you want to keep the source free of encoding specific characters . . ...or - #$01FE and then the whole string becomes a Unicode string which is either kept that way (if it is assigned to a UnicodeString constant), or it is converted to some 8-bit encoding at compile time (if it is assigned to an 8-bit constant/variable like ansistring) (also just my understanding of what Jonas wrote) That's how I read it as well. In which case, is #A3 16-bit Unicode (representing the UK £ Sterling) or malformed UTF-8 (should be #c2#a3)? -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues] ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] Re: Strings - the fun part [very off-topic]
Hi, this is very off-topic, no serious responses expected... Everytime I read anything about strings here, this comes into my mind: http://www.rigsofrods.com/entries/155-the-chaos-of-character-encodings Strings seem to be a general problem for numeric machines :D d.l.i.w ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String handling in trunk (was utf8 in 2.6.0)
On Mon, Jan 7, 2013 at 6:05 PM, Mark Morgan Lloyd markmll.fpc-de...@telemetry.co.uk wrote: Tomas Hajny wrote: On Mon, January 7, 2013 13:28, Ewald wrote: Once upon a time, on 01/07/2013 12:39 PM to be precise, Michael Schnell said: On 01/05/2013 12:28 PM, Jonas Maebe wrote: Using whatever #xx#xx or #xx#xx#xx sequence represents the UTF-8 encoding of that character. Sorry, I can't follow. Does #xx not just define a numerical representation of an 8 bit entity ? The interpretation in any code might be done later by any code that digests the string. Am I wrong ? I *think* Jonas is trying to say that if you want the character `Ǿ` in a string you would either type - 'Ǿ' or - #$C7#$BE if you want to keep the source free of encoding specific characters . . ...or - #$01FE and then the whole string becomes a Unicode string which is either kept that way (if it is assigned to a UnicodeString constant), or it is converted to some 8-bit encoding at compile time (if it is assigned to an 8-bit constant/variable like ansistring) (also just my understanding of what Jonas wrote) That's how I read it as well. In which case, is #A3 16-bit Unicode (representing the UK £ Sterling) or malformed UTF-8 (should be #c2#a3)? The way I understand it is that #A3 will be effected by $codepage directive of source file. So, if programmer correctly sets $codepage to match encoding used in editor (be it utf8 or some other encoding), compiler will also 'understand' that string correctly. If programmer never uses UnicodeString, and always uses codepage which was used to write source code, everything will work fine - #A3 will stay whatever it is in specific encoding. On the other hand, if there comes situation in which string containing #A3 needs to be converted to UnicodeString, compiler will either: a) convert it correctly to UnicodeString if encoding used is utf8, or b) call system-specific function to convert string to array of WideChar-s (in which case, correctness of the program depends on support for specific encoding on tharget system). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
Martin Schreiber schrieb: but I fear we can not use that information for development with Free Pascal because: The string is represented internally as a Unicode string encoded as UTF-16. Characters in the Basic Multilingual Plane (BMP) take 2 bytes, and characters not in the BMP require 4 bytes. and A control string is a sequence of one or more control characters, each of which consists of the # symbol followed by an unsigned integer constant from 0 to 65,535 (decimal) or from $0 to $ (hexadecimal) in UTF-16 encoding, and denotes the character corresponding to a specified code value. Each integer is represented internally by 2 bytes in the string. This is useful for representing control characters and multibyte characters. which seems to be different from Free Pascal. Correction: You're right, Delphi treats control characters as UTF-16 codes, where FPC treats them as byte values (if less than 256). I noticed the possible problem already, that the FPC interpretation of control characters is context sensitive. This leads to write-only code, because a change of the $codepage would require to change all control characters in that unit accordingly. This in addition to the removal or addition of control characters 255, which also lead to a different interpretation of the remaining control characters *and* to a different internal representation. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel