Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Jonas Maebe wrote: If you'd want to limit the length to 2GB on 64 bit systems. I also don't know whether all 64 bit CPUs support atomic operations on 32 bit entities (for the reference count). Something might be said for compatibility towards 32 bit implementations that the maximum length is the same, but I don't really have an opinion on whether this is actually a good idea (limiting the size for compatibility reason). At least Intel and PowerPC do have atomic operations on smaller sizes. Intel: system programming guide vol 3 section 8.1.1 says byte, word, doubleword, quadword (since Pentium) are all atomic if they are naturally aligned. (word = 16 bits) 8.1.2.2 notes that LOCK is also best to be used on naturally aligned boundaries for 8/16/32/64 bit accesses for best performance. PowerPC: pem64, 2005mar, section 5.1.2 says byte, halfword, word, doubleword (64 bit only) are atomic if they are naturally aligned. (word = 32 bits). lwarx/stwcx still exists on 64 bit implementations. So both guarantee smaller reads/writes are atomic. Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Thu, November 12, 2009 08:56, Marco van de Voort wrote: In our previous episode, Tomas Hajny said: supported codepages in the next version of MS Windows (or that they don't support a different list in some special version, like a version for the Chinese market) breaking your selection of 50 free values in Windows range? In that unlikely case, change the range. That raises a question whether incompatibility between two FPC versions Incompatibility how exactly? Two different FPC versions are already not compatible. If you need to change the used range between e.g. FPC 2.6.x and 2.8.x (due to MS extending their use of the codepage values into the range we decided to use in FPC), this makes 2.6.x and 2.8.x incompatible to each other, right? is better than incompatibility between FPC and Delphi (caused by tight connection between Delphi and one particular platform)... That would be source incompatibility, and therefore much worse. First, this may be the case for compatibility between two FPC versions too. Second, the relation between the numeric values appearing in FPC sources and how the compiler translates the sources to the internal representation in memory (which is possibly only valid for the particular platform) is something that may not be the same (depending on the use cases, of course). Like about 50/280. That's the point of most used. For the less likely ones, define constants to the windows codepages. I don't understand what you mean by define constants to the windows codepages. The 16-bit range is split between a short FPC range and a long Delphi/Windows range. Rarely used codepages use the windows codepage number, and if foreign OSes support that, they must implement a windows2local codepage number conversion. As far as I'm concerned, I'm fine with providing a translation table between Windows codepages and individual platforms (e.g. OS/2), but I'm less comfortable with having to use this translation at runtime under all platforms except for Windows and I'm somewhat worried about not having a solution for supporting character set which may be used e.g. for console on non-windows platforms but are not supported by Windows (have a look at the URL sent by Jonas yesterday for Mac OS X; without having performed complete comparison, it seemed to contain some character sets not listed on the MSDN page for Windows). of certain constants, I can imagine that we should be able to find a gap in the windows character set numbering to cover at least all the character sets registered by IANA. Implementing at all only makes sense if OSes implement them exactly. Several Windows codepages might map to corresponding IANA sets. Do you have some examples of this case? However, we need to provide mapping between the MS Windows character set number and the native character set number for all character set numbers defined in Windows and supported by the particular platform, otherwise the compatibility argument doesn't hold any longer, does it? Just like that you must be able to map the IANA sets to actually supported sets on all platforms. Yes, absolutely. The only potential advantage of IANA numbers would be ensured compatibility across future FPC versions without risk that we need to remap the codepage numbers in the future due to MS or some other vendor changing use of their platform specific constants. I don't say that this is a must or necessarily the best option, just an option we may want to consider depending on the use cases (see below). Note that is all just a guestimate on the size of the free ranges. But I rather not expand that too much. I'm pretty sure that Windows actually support fewer character sets than what is defined in IANA. Since Windows already use word values, there should be fairly large gaps. Looking at the MSDN documentation (http://msdn.microsoft.com/en-us/library/dd317756.aspx), there are 152 values defined altogether and there's currently e.g. just a single value used in the 3 range, no value in 4, nothing between 38 and 436 (probably rather unlikely to change, I'd expect changes rather in other areas), nothing between 1362 and , etc. If the ranges are large enough we can try to fit them in all somewhere. But this means the lesser used codepages are also in twice, blowing up lookuptables or codepages. Yes. Either at compile time (where it makes no difference at all), or possibly also at runtime where this means something like 1600 bytes on 32-bit platforms (assuming 200 records with 2 fields of 4 bytes each). as I understand it, at least console character set information is provided using charset name provided in an environment variable there)? Put them in the table too, for Unix. From certain perspective, these text versions may be useful for all platforms (imagine HTML character set declarations). However, there's a risk that they may not be used completely
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Micha Nelissen wrote: Intel: system programming guide vol 3 section 8.1.1 says byte, word, doubleword, quadword (since Pentium) are all atomic if they are naturally aligned. (word = 16 bits) 8.1.2.2 notes that LOCK is also best to be used on naturally aligned boundaries for 8/16/32/64 bit accesses for best performance. Hmm, note: it seems that intel's movq to move quadword (64bit) register (or load/store data) is usable only for XMM/MMX registers, not the regular registers? This would mean there is no atomic load/store for the native sized regular registers (rax, rcx, etc) in x86_64? Is this right? Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
In our previous episode, Tomas Hajny said: Incompatibility how exactly? Two different FPC versions are already not compatible. If you need to change the used range between e.g. FPC 2.6.x and 2.8.x (due to MS extending their use of the codepage values into the range we decided to use in FPC), this makes 2.6.x and 2.8.x incompatible to each other, right? I don't see how. Not if the symbolic constants are used, which is the whole point. Note that this is already a slim chance. Specially if the ranges are as big as you say. is better than incompatibility between FPC and Delphi (caused by tight connection between Delphi and one particular platform)... That would be source incompatibility, and therefore much worse. First, this may be the case for compatibility between two FPC versions too. Not if we specify from the start that the RTL predefined symbolic constants are the only supported values, and that the windows codepages _might_ work depending on platform. (e.g. to avoid having too much overhead on embedded targets). IOW numeric values are undefined, but might map to windows codepages if the platform supports it. We have that luxury. The values in Delphi code with windows codepages are already out there, and we have no real power to change that. Sure, you can ultimately convince Delphi open source projects to use FPC valeus and define them for themselves, but is hard, and a problem that you face with each new piece of Delphi code. Again and again. Second, the relation between the numeric values appearing in FPC sources and how the compiler translates the sources to the internal representation in memory (which is possibly only valid for the particular platform) is something that may not be the same (depending on the use cases, of course). Yes, you could lay a xlat layer within the parser. (source number to unicode encoding word mapping) That will break much less code. (only Delphi assembler code), but pulls the lot into the compiler. Not desirable IMHO, and moreover, I'm not convinced there really is a problem that warrants such draconian measures in the first place. and if foreign OSes support that, they must implement a windows2local codepage number conversion. As far as I'm concerned, I'm fine with providing a translation table between Windows codepages and individual platforms (e.g. OS/2), but I'm less comfortable with having to use this translation at runtime under all platforms except for Windows and I'm somewhat worried about not having a solution for supporting character set which may be used e.g. for console on non-windows platforms but are not supported by Windows (have a look at the URL sent by Jonas yesterday for Mac OS X; without having performed complete comparison, it seemed to contain some character sets not listed on the MSDN page for Windows). The lookup only happens at the iconv moment, which is magnitudes more expensive. The example to windows code pages was a bit windows centric, but was only an example. The word to whatever the encoding procedure uses transformation is platform dependant. In the windows case this means a lookup has to be inserted to handle the FPC predefined ones. For other platforms a lookup has to be inserted no matter what. of certain constants, I can imagine that we should be able to find a gap in the windows character set numbering to cover at least all the character sets registered by IANA. Implementing at all only makes sense if OSes implement them exactly. Several Windows codepages might map to corresponding IANA sets. Do you have some examples of this case? I never brought up IANA :-) The point is while IANA might be a standard, the APIs probably don't use IANA numbers as compatibility argument doesn't hold any longer, does it? Just like that you must be able to map the IANA sets to actually supported sets on all platforms. Yes, absolutely. The only potential advantage of IANA numbers would be ensured compatibility across future FPC versions without risk that we need to remap the codepage numbers in the future due to MS or some other vendor changing use of their platform specific constants. I don't say that this is a must or necessarily the best option, just an option we may want to consider depending on the use cases (see below). If you guarantee numeric compatibility for the FPC side. Something I don't plan to do. Only symbolic. (FPC_IANA1_English or whatever. Not the corresponding numeric value) If the ranges are large enough we can try to fit them in all somewhere. But this means the lesser used codepages are also in twice, blowing up lookuptables or codepages. Yes. Either at compile time (where it makes no difference at all), or possibly also at runtime where this means something like 1600 bytes on 32-bit platforms (assuming 200 records with 2 fields of 4 bytes each). And more if the target codepages identifier is a strings yes, less if you can build
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Thu, November 12, 2009 14:19, Marco van de Voort wrote: In our previous episode, Tomas Hajny said: Incompatibility how exactly? Two different FPC versions are already not compatible. If you need to change the used range between e.g. FPC 2.6.x and 2.8.x (due to MS extending their use of the codepage values into the range we decided to use in FPC), this makes 2.6.x and 2.8.x incompatible to each other, right? I don't see how. Not if the symbolic constants are used, which is the whole point. Note that this is already a slim chance. Specially if the ranges are as big as you say. OK, I see. That hasn't been my understanding. Still, are we sure that it only has impacts to usability of existing source files and nothing else? Is it really sure that the codepage number is never written into a file when storing the strings? Otherwise compatibility at the level of numeric values may be necessary. is better than incompatibility between FPC and Delphi (caused by tight connection between Delphi and one particular platform)... That would be source incompatibility, and therefore much worse. First, this may be the case for compatibility between two FPC versions too. Not if we specify from the start that the RTL predefined symbolic constants are the only supported values, and that the windows codepages _might_ work depending on platform. (e.g. to avoid having too much overhead on embedded targets). IOW numeric values are undefined, but might map to windows codepages if the platform supports it. I'm not sure if we really manage to get this message through. :-( People who are interested in working at the level of individual codepages would be exactly those who would probably never take care about translating the codepage value (as required by Delphi) into some symbolic constant (not supported by Delphi)... We have that luxury. The values in Delphi code with windows codepages are already out there, and we have no real power to change that. We have the luxury, but we can almost equally well skip this definition of symbolic constants altogether in that case because I suspect that hardly anyone will use them anyway. Still, I'm more concerned in the (unnecessary) runtime overhead. Sure, you can ultimately convince Delphi open source projects to use FPC values and define them for themselves, but is hard, and a problem that you face with each new piece of Delphi code. Again and again. Completely true. Second, the relation between the numeric values appearing in FPC sources and how the compiler translates the sources to the internal representation in memory (which is possibly only valid for the particular platform) is something that may not be the same (depending on the use cases, of course). Yes, you could lay a xlat layer within the parser. (source number to unicode encoding word mapping) That will break much less code. (only Delphi assembler code), but pulls the lot into the compiler. Not desirable IMHO, and moreover, I'm not convinced there really is a problem that warrants such draconian measures in the first place. What draconian measure? Per platform mapping? The translation has much lower impact if performed once at compile time than every time at runtime, right? and if foreign OSes support that, they must implement a windows2local codepage number conversion. As far as I'm concerned, I'm fine with providing a translation table between Windows codepages and individual platforms (e.g. OS/2), but I'm less comfortable with having to use this translation at runtime under all platforms except for Windows and I'm somewhat worried about not having a solution for supporting character set which may be used e.g. for console on non-windows platforms but are not supported by Windows (have a look at the URL sent by Jonas yesterday for Mac OS X; without having performed complete comparison, it seemed to contain some character sets not listed on the MSDN page for Windows). The lookup only happens at the iconv moment, which is magnitudes more expensive. The example to windows code pages was a bit windows centric, but was only an example. The word to whatever the encoding procedure uses transformation is platform dependant. In the windows case this means a lookup has to be inserted to handle the FPC predefined ones. For other platforms a lookup has to be inserted no matter what. Yes. However, possibly at compile time only (if the source files compatibility is the only issue we are concerned about - I'm still not clear about that). of certain constants, I can imagine that we should be able to find a gap in the windows character set numbering to cover at least all the character sets registered by IANA. Implementing at all only makes sense if OSes implement them exactly. Several Windows codepages might map to corresponding IANA sets. Do you have some examples of this case? I never brought up IANA :-) The point is while IANA
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Martin Schreiber wrote: It supports now Linux x86-64. I used about 110 hours for the adaption. Problem is gdb and dwarf on x86-64 (display of var parameters) and freezing of the application if compiled with -gl in case of an exception. I can't understand that Lazarus lives with this since years. [a bit off topic I guess] fpGUI and as far as I know Lazarus has no problems with the -gl compiler parameter and 64-bit platforms. I can definitely vouch for fpGUI in this regard, because I always use -gl and have used 64-bit FPC for the last 3-4 months. So your problem seems to be an issue in MSEgui, not a general compiler issue. As for the 'var parameter' issue. I do notice that it displays the address instead of value which is weird. Then again, I must admit that I don't use GDB much to debuging my applications. I use the tiLog units from tiOPF project (seeing that most of my apps are tiOPF based) and can log to Console, File, GUI Log Window etc... tiOPF's log support are multi-threaded, can be cached to not hinder app performance and has various log types which can be toggled at compile time or runtime to filter what is logged or not. I personally find the tiOPF log support a lot more handy and reliable than GDB. At least this applies to my projects I work on, your mileage may vary. I still need to investigate the DebugIntf unit and the GUI debug server idea. It looks promising. I asked in the MSEide newsgroup about the interest in 64 bit, the result was near zero. Eventually all desktop OSes will move to 64bit. So want it or not, it is going to be required at some point. :) Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
thaddy wrote: Just to make a small point: the choice for UTF16 was made because of market reasons, not technical ones. Chinese Korean, Japanese markets are important for Delphi. Try to figure out how to fit that in UTF8. Just a thought. I have to agree with Marco there. Delphi simply follows Microsoft like a moth to a flame. What is the technical issue with UTF-8 compared UTF-16. Both are Unicode standards. I personally feel UTF-8 is a much better idea that UTF-16 though. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Florian Klaempfl wrote: and therefore the RTL and compiler procedures are identical with exception of the initialization of 4 bytes with a constant? Well, two times two byte ;) I have not looked at the cpstrnew branch, but what is the 4 bytes used for in each string? Couldn't the individual bits in 1 byte value be used? That will reduce the extra memory overhead by 75%? Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Graeme Geldenhuys wrote: I have not looked at the cpstrnew branch, but what is the 4 bytes used for in each string? Couldn't the individual bits in 1 byte value be used? That will reduce the extra memory overhead by 75%? It won't. 4 bytes will be used anyway because of alignment. On x86_64, it is even 8 bytes. Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Martin Schreiber wrote: OK, so you say that the processing of the new and the current UnicodeString and therefore the RTL and compiler procedures are identical with exception of the initialization of 4 bytes with a constant? Now that is exciting. Of course with any operation that uses two strings as an input, the encoding and character size DWord (32 Bits, 24 Bits relevant), needs to be checked if not equal. But this does not seem like much overhead to me. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Michael Schnell schrieb: Martin Schreiber wrote: OK, so you say that the processing of the new and the current UnicodeString and therefore the RTL and compiler procedures are identical with exception of the initialization of 4 bytes with a constant? Now that is exciting. Of course with any operation that uses two strings as an input, the encoding and character size DWord (32 Bits, 24 Bits relevant), needs to be checked if not equal. But this does not seem like much overhead to me. Only if the string type is RawByteString. E.g. e procedure taking an unicode string as parameter needs no additional checking. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Michael Schnell schrieb: Florian Klaempfl wrote: No other string types being involved especially things like RawByteString. If you additionally implement the encoding Type RawWordString, Martin should be happy. No. RawByteString means simply: encoding unknown, the string is just a couple of bytes (which could also form words or dwords) described by the the encoding and char size fields. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Marco van de Voort wrote: While I think it would be best to use native encoding on all platforms as much as possible, that is an opinion. However not using native encoding for general processing is nuts. So we need the UTF8 type anyway. Of course that is true. So IMHO (at least) theses encoding types should be supported: - RawDWordString - RawWordString (handled like good old WideStrings ?(*) ) - RawByteString (handled like good old Strings ?(*) ) - ANSI 8 Bit code page (*) - UTF-8 - UTF-16 - 32 Bit Unicode (is this the same as RawDWordString, or is support for Surrogate pairs and such necessary ?) (*) what about comparing if str1=str2 then... regarding case sensitivity ? What about the appropriate character types ? (e.g. char1 := String[1]) I suppose there should be - ANSICHAR: 8 Bits Type (for ANSI 8 Bit Strings) and - UnicodeChar: 32 Bit Type (for all Unicode encoded strings) Supporting automatic conversion (if possible). The Raw Strings should output Byte, Word and DWORD appropriately. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Florian Klaempfl wrote: The constant encoding the code page requires 2 bytes, Could those become enum values instead, which means you can store up to 255 different code-page types in 1 byte. 255 should be sufficient to cover all available code-page constants (I think). Or couldn't that maybe be reduced to 6 bits for code page enum values, giving a total of 63 different values for code page types (still sufficient I think). And then the remaining 2 bits for element size. So total usage is still 1 byte. require 2 bits only, but alignment issues waste the remaining 14 bit anyways. Sorry, I'm a bit rusty on bit alignment issues. Any URL you can suggest where I could read further about this? Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Sergei Gorelkin wrote: It won't. 4 bytes will be used anyway because of alignment. On x86_64, it is even 8 bytes. Does that mean in the cpstrnew branch and on x86_64 systems, the UTF-8 string 'a' will be 9 bytes long? Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On 11 Nov 2009, at 10:07, Graeme Geldenhuys wrote: Sergei Gorelkin wrote: It won't. 4 bytes will be used anyway because of alignment. On x86_64, it is even 8 bytes. Does that mean in the cpstrnew branch and on x86_64 systems, the UTF-8 string 'a' will be 9 bytes long? No, 25 bytes. The plain ansistring 'a' is already 17 bytes on x86_64 platforms today. Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Florian Klaempfl wrote: No. RawByteString means simply: encoding unknown, the string is just a couple of bytes (which could also form words or dwords) described by the the encoding and char size fields. I see. Bit this is exactly what I meant: RawByteString, RawWordString and RawDWordString and RawQWordString are distinguishable by the character-size (1, 2, 4, 8) tag. x := str1[1] is handled appropriately. I feel that what Martin wants (the old Widestring type) would be a two-Byte RawByteString. But if there are differences (e.g. with non case sensitive if str1= str2 then ...) an additional WideChar encoding type might be sensible. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On 11 Nov 2009, at 10:37, Michael Schnell wrote: Of course that is true. So IMHO (at least) theses encoding types should be supported: Please read this document first: http://edn.embarcadero.com/article/38980 - RawDWordString - RawWordString (handled like good old WideStrings ?(*) ) - RawByteString (handled like good old Strings ?(*) ) RawWordString and RawDWordString don't make any sense. All Strings with 32 bit elements are UTF-32. Those with 16 bit elements can't be UTF-16 big or little endian, or UCS-2, but I can't see why any programmer would want a routine to accept strings in any of those formats but not in any other format. Normally, you'd pick one of those (e.g., utf16string) and be done with it. E.g., MSEGui could perfectly use utf16string everywhere (which would be equivalent to the current unicodestring except for the extra 4/8 bytes), and it would work pretty much the same as it does today. Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Jonas Maebe wrote: RawWordString and RawDWordString don't make any sense. All Strings with 32 bit elements are UTF-32. What about case sensitivity with if str1=str2 then ... ? Those with 16 bit elements can't be UTF-16 big or little endian, or UCS-2, but I can't see why any programmer would want a routine to accept strings in any of those formats but not in any other format. Strings are a perfect tool to just store some values, disregarding any encoding. Something like dynamic arrays but with the benefit of reference counting and using standard objects like TStringList. AFAIK, the recent Delphi provides a completely new object for that purpose, which I do think is not necessary at all, and braking a lot of existing code. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
In our previous episode, Martin Schreiber said: [ Charset ISO-8859-1 unsupported, converting... ] On Tuesday 10 November 2009 21:38:33 Marco van de Voort wrote: The only problem is the db-aware part and the few other widgets that can have 10 elements and more, like treeview. There mass conversion can hurt, e.g. when loading the data into the widget. That's the reason why MSEgui stores DB strings as UnicodeString in tmsebufdatset and the other DB access components. Converting utf-8 - utf-16 - utf-8 once while receiving/sending the data over the wire is fast compared to the data transmission, you will not notice the small difference. ;-) Who says db is over a wire? It could be a local socket. But I haven't benchmarked it, it is just the only scenario that I can think off that is noticable, if your widgets are constructed so that always only the visible set of items is copied to the widgetset. (then it doesn't matter if the widgetset is utf8 or utf16) The MSEgui list and tree components store stringdata as UnicodeString of course. I don't see the of course. It would not be my choice for non-visual and/or many nodes containing classes. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On 11 Nov 2009, at 11:04, Michael Schnell wrote: What about case sensitivity with if str1=str2 then ... ? The = operator for strings has always performed a simply byte-wise comparison until now, and presumably keeps acting the same in Delphi 2009 with the new UnicodeStrings. Custom string comparisons require helper routines (such CompareText and friends), because there are a lot of different options that you can require (numeric vs lexical vs raw byte comparisons, case sensitive/insensitive, ignoring diacritics or not, treating decomposed and composed code points the same or not, ...). You cannot shoehorn all of that on a single operator. Strings are a perfect tool to just store some values, disregarding any encoding. Something like dynamic arrays but with the benefit of reference counting Dynamic arrays are reference counted. and using standard objects like TStringList. It would also be useful if you could store strings into a set (transparent hash set!), but that does not mean that every possible data structure for every possible data type should be built into the language. There is no inherent reason why you cannot easily create tstringlist-like functionality for dynamic arrays. AFAIK, the recent Delphi provides a completely new object for that purpose, which I do think is not necessary at all, and braking a lot of existing code. I have not heard about this. Not to mention that changing strings into a hypothetical rawdwordstring would also break existing code, so I don't see how that's better or worse in that respect. Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Tuesday 10 November 2009 19:08:45 Florian Klaempfl wrote: Martin Schreiber schrieb: On Tuesday 10 November 2009 18:33:54 Florian Klaempfl wrote: Did you look into the code in cpstrnew branch? I did. And which changes are exactly the reason for your concerns? More checks Where? A pure unicodestring routine won't get additional checks. and more complicated address calculation. Where? Adding 16 instead of 12 makes no difference. The major difference will be the initialization of the additional fields and the increased memory requirement of 4 bytes per string. I try to prove the exciting statement. How can I build a startup compiler for the cpstrnew branch or how to compile the cpstrnew branch? Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Wed, November 11, 2009 09:58, Graeme Geldenhuys wrote: Florian Klaempfl wrote: The constant encoding the code page requires 2 bytes, Could those become enum values instead, which means you can store up to 255 different code-page types in 1 byte. 255 should be sufficient to cover all available code-page constants (I think). Or couldn't that maybe be reduced to 6 bits for code page enum values, giving a total of 63 different values for code page types (still sufficient I think). And then the remaining 2 bits for element size. So total usage is still 1 byte. . . I'm afraid that your assumption about the number of codepages is incorrect. The MIBENUM value defined in list at http://www.iana.org/assignments/character-sets goes up to 2059 at the moment. Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Jonas Maebe wrote: Does that mean in the cpstrnew branch and on x86_64 systems, the UTF-8 string 'a' will be 9 bytes long? No, 25 bytes. The plain ansistring 'a' is already 17 bytes on x86_64 platforms today. Is that because of sizeint? Wouldn't longint be long enough? Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Martin Schreiber schrieb: On Tuesday 10 November 2009 19:08:45 Florian Klaempfl wrote: Martin Schreiber schrieb: On Tuesday 10 November 2009 18:33:54 Florian Klaempfl wrote: Did you look into the code in cpstrnew branch? I did. And which changes are exactly the reason for your concerns? More checks Where? A pure unicodestring routine won't get additional checks. and more complicated address calculation. Where? Adding 16 instead of 12 makes no difference. The major difference will be the initialization of the additional fields and the increased memory requirement of 4 bytes per string. I try to prove the exciting statement. How can I build a startup compiler for the cpstrnew branch or I use the 2.2.4/2.4.0 ide to build the compiler, so make in the cpstrnew branch compiler dir should work using 2.2.4/2.4.0 how to compile the cpstrnew branch? Since it's a branch, I'am not sure if usual building works. We're still at the point to add new stuff and rework things so it might be heavily broken. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On 11 Nov 2009, at 13:47, Micha Nelissen wrote: Jonas Maebe wrote: No, 25 bytes. The plain ansistring 'a' is already 17 bytes on x86_64 platforms today. Is that because of sizeint? Yes. Wouldn't longint be long enough? If you'd want to limit the length to 2GB on 64 bit systems. I also don't know whether all 64 bit CPUs support atomic operations on 32 bit entities (for the reference count). Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Wed, November 11, 2009 13:42, Tomas Hajny wrote: On Wed, November 11, 2009 09:58, Graeme Geldenhuys wrote: Florian Klaempfl wrote: The constant encoding the code page requires 2 bytes, Could those become enum values instead, which means you can store up to 255 different code-page types in 1 byte. 255 should be sufficient to cover all available code-page constants (I think). Or couldn't that maybe be reduced to 6 bits for code page enum values, giving a total of 63 different values for code page types (still sufficient I think). And then the remaining 2 bits for element size. So total usage is still 1 byte. . . I'm afraid that your assumption about the number of codepages is incorrect. The MIBENUM value defined in list at http://www.iana.org/assignments/character-sets goes up to 2059 at the moment. To avoid potential misunderstanding - the number of codepages is not 2059 (there are about 200 character sets listed there at the moment). Still, the current number already goes beyond the 63 and in the future it may possibly grow beyond 255 too. Moreover, this MIBENUM number is the only standardized reference value as far as I know; apart from that, the various codepage numbers are platform specific (btw, I still wonder how this platform specific mapping is performed in the new concept). Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Martin Schreiber wrote: I try to prove the exciting statement. How can I build a startup compiler for the cpstrnew branch or how to compile the cpstrnew branch? I use the next bat file in the compiler dir: [copy of my bat] @echo off c:\programming\cpstrnew\bin\i386-win32\make.exe clean all OPT=-g copy ppc386.exe ..\bin\i386-win32\ /y copy utils\*.exe ..\bin\i386-win32\ /y [/copy of my bat] Initially I copied fpc 2.2.4 bin directory to the cpstrnew checkout directory To build rtl I use the next bat: [copy of my bat] @echo off ..\bin\i386-win32\make.exe clean all OPT=-g [/copy of my bat] Best regards, Paul Ishenin. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Tomas Hajny wrote: I'm afraid that your assumption about the number of codepages is incorrect. The MIBENUM value defined in list at http://www.iana.org/assignments/character-sets goes up to 2059 at the moment. Is FPC realistically going to implement all 2059 of those? I still think even using the 63 most frequently used code page values should be ample to keep FPC developers happy for quite some time. But thank you, I stand corrected. Just goes to show, you should never assume anything when it comes to programming. ;-) Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On 11 Nov 2009, at 13:49, Graeme Geldenhuys wrote: Is FPC realistically going to implement all 2059 of those? We might implement 1 or 2 of those. Most of them will however be handled by libiconv, the Windows code page conversion APIs, or some other external library (just like with the current widestring manager). Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Tomas Hajny wrote: I'm afraid that your assumption about the number of codepages is incorrect. The MIBENUM value defined in list at http://www.iana.org/assignments/character-sets goes up to 2059 at the moment. I just double checked. That number is in theory only. There are not nearly that amount in reality. For example: Below the numbers jump from 119 to 1000, then from 1020 to 2000. So there are HUGE gaps in that theoretical range. - ... Name: KZ-1048 MIBenum: 119 Source: See http://www.iana.org/assignments/charset-reg/KZ-1048 [Veremeev, Kikkarin] Alias: STRK1048-2002 Alias: RK1048 Alias: csKZ1048 Name: ISO-10646-UCS-2 MIBenum: 1000 Source: the 2-octet Basic Multilingual Plane, aka Unicode this needs to specify network byte order: the standard does not specify (it is a 16-bit integer space) Alias: csUnicode ... - Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Wednesday 11 November 2009 13:47:43 Florian Klaempfl wrote: I try to prove the exciting statement. How can I build a startup compiler for the cpstrnew branch or I use the 2.2.4/2.4.0 ide to build the compiler, so make in the cpstrnew branch compiler dir should work using 2.2.4/2.4.0 program overhead; {$ifdef FPC}{$mode objfpc}{$h+}{$endif} {$ifdef mswindows}{$apptype console}{$endif} uses sysutils; var str1: unicodestring; begin str1:= 'abcde'; end. Free Pascal Compiler version 2.3.1 [2009/11/11] for i386 Copyright (c) 1993-2009 by Florian Klaempfl Target OS: Linux for i386 Compiling overhead.pas Compiling /home/mse/packs/standard/svn/fp/cpstrnew/rtl/linux/system.pp Fatal: Compilation aborted An unhandled exception occurred at $FFF4 : EAccessViolation : Access violation $FFF4 $0813A594 DO_GENERATE_CODE, line 1432 of psub.pas $0813A4D3 READ_PROC_BODY, line 1508 of psub.pas $0813A8CA READ_PROC, line 1632 of psub.pas $0813ABA1 READ_DECLARATIONS, line 1721 of psub.pas $08137884 BLOCK, line 172 of psub.pas $08139F4E TCGPROCINFO__PARSE_BODY, line 1318 of psub.pas $08186A15 PROC_UNIT, line 1127 of pmodules.pas $0816F93D COMPILE, line 394 of parser.pas $0817E22C TPPUMODULE__LOADPPU, line 1531 of fppu.pas $0818520C ADDUNIT, line 440 of pmodules.pas $0818555A LOADDEFAULTUNITS, line 556 of pmodules.pas $08188AF7 PROC_PROGRAM, line 2005 of pmodules.pas $0816F977 COMPILE, line 402 of parser.pas $0806986C COMPILE, line 246 of compiler.pas $08048256 main, line 223 of pp.pas pp.pas compiled with fixes_2_4 by /home/mse/packs/standard/svn/fp/fixes_2_4/compiler/ppc386 -opp -Fu./systems/ -Fi./systems/ -Fl./systems/ -Fo./systems/ -Fu./x86/ -Fi./x86/ -Fl./x86/ -Fo./x86/ -Fu./i386/ -Fi./i386/ -Fl./i386/ -Fo./i386/ -l -Mobjfpc -Sh -dsvnfixes_2_4 -dI386 -gl pp.pas -dsvnfixes_2_4 switches -F* directories in .fpc.cfg. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Wed, November 11, 2009 13:56, Graeme Geldenhuys wrote: Tomas Hajny wrote: I'm afraid that your assumption about the number of codepages is incorrect. The MIBENUM value defined in list at http://www.iana.org/assignments/character-sets goes up to 2059 at the moment. I just double checked. That number is in theory only. There are not nearly that amount in reality. For example: Below the numbers jump from 119 to 1000, then from 1020 to 2000. So there are HUGE gaps in that theoretical range. Yes, indeed - see my second e-mail. Still, the number makes a difference. Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Martin Schreiber schrieb: On Wednesday 11 November 2009 13:47:43 Florian Klaempfl wrote: I try to prove the exciting statement. How can I build a startup compiler for the cpstrnew branch or I use the 2.2.4/2.4.0 ide to build the compiler, so make in the cpstrnew branch compiler dir should work using 2.2.4/2.4.0 program overhead; {$ifdef FPC}{$mode objfpc}{$h+}{$endif} {$ifdef mswindows}{$apptype console}{$endif} uses sysutils; var str1: unicodestring; begin str1:= 'abcde'; end. Free Pascal Compiler version 2.3.1 [2009/11/11] for i386 Copyright (c) 1993-2009 by Florian Klaempfl Target OS: Linux for i386 Compiling overhead.pas Compiling /home/mse/packs/standard/svn/fp/cpstrnew/rtl/linux/system.pp Fatal: Compilation aborted An unhandled exception occurred at $FFF4 : EAccessViolation : Access violation $FFF4 $0813A594 DO_GENERATE_CODE, line 1432 of psub.pas $0813A4D3 READ_PROC_BODY, line 1508 of psub.pas $0813A8CA READ_PROC, line 1632 of psub.pas $0813ABA1 READ_DECLARATIONS, line 1721 of psub.pas $08137884 BLOCK, line 172 of psub.pas $08139F4E TCGPROCINFO__PARSE_BODY, line 1318 of psub.pas $08186A15 PROC_UNIT, line 1127 of pmodules.pas $0816F93D COMPILE, line 394 of parser.pas $0817E22C TPPUMODULE__LOADPPU, line 1531 of fppu.pas $0818520C ADDUNIT, line 440 of pmodules.pas $0818555A LOADDEFAULTUNITS, line 556 of pmodules.pas $08188AF7 PROC_PROGRAM, line 2005 of pmodules.pas $0816F977 COMPILE, line 402 of parser.pas $0806986C COMPILE, line 246 of compiler.pas $08048256 main, line 223 of pp.pas pp.pas compiled with fixes_2_4 by /home/mse/packs/standard/svn/fp/fixes_2_4/compiler/ppc386 -opp -Fu./systems/ -Fi./systems/ -Fl./systems/ -Fo./systems/ -Fu./x86/ -Fi./x86/ -Fl./x86/ -Fo./x86/ -Fu./i386/ -Fi./i386/ -Fl./i386/ -Fo./i386/ -l -Mobjfpc -Sh -dsvnfixes_2_4 -dI386 -gl pp.pas What rtl did you use? You need one from the branch. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Wednesday 11 November 2009 15:11:07 Florian Klaempfl wrote: What rtl did you use? You need one from the branch. Compiling the cpstrnew rtl with fixes_2_4 does not work: Free Pascal Compiler version 2.3.1 [2009/11/01] for i386 Copyright (c) 1993-2009 by Florian Klaempfl Target OS: Linux for i386 Compiling pp.pas pp.pas(158,6) Warning: MINSTACKSIZE is not supported by the target OS Compiling /home/mse/packs/standard/svn/fp/cpstrnew/rtl/linux/system.pp systemh.inc(296,31) Fatal: Syntax error, ; expected but found Fatal: Compilation aborted No surprise. ;-) make clean all in cpstrnew/compiler with FPC 2.2.4 and compiling overhead.pas with the new ppc386: Free Pascal Compiler version 2.3.1 [2009/11/11] for i386 Copyright (c) 1993-2009 by Florian Klaempfl Target OS: Linux for i386 Compiling overhead.pas Compiling /home/mse/packs/standard/svn/fp/cpstrnew/rtl/linux/system.pp Fatal: Compilation aborted An unhandled exception occurred at $FFF4 : EAccessViolation : Access violation $FFF4 $081310A4 $08130FE4 $081313DA $081316B1 $0812E384 $08130A5E $0817D3B3 $08165E2D $08174B0C $0817BB8C $0817BEDA $0817F4A7 $08165E67 $080628C6 $08048236 #!/bin/sh fpcdir=/home/mse/packs/standard/svn/fp/cpstrnew cd .. cd ${fpcdir}/rtl make clean cd .. make OPT=-gl -O- FPC=${fpcdir}/compiler/ppc386 rtl - [...] /home/mse/packs/standard/svn/fp/cpstrnew/compiler/ppc386 -Ur -Ur -Xs -O2 -n -Fi../inc -Fi../i386 -Fi../unix -Fii386 -FE. -FU/home/mse/packs/standard/svn/fp/cpstrnew/rtl/units/i386-linux -gl -O- -di386 -dRELEASE ../unix/cwstring.pp cwstring.pp(731,26) Error: Incompatible types: got address of procedure(PWideChar,var AnsiString,LongInt);Register expected procedure variable type of procedure(PWideChar,var RawByteString,Word,LongInt);Register cwstring.pp(732,26) Error: Incompatible types: got address of procedure(PChar,var WideString,LongInt);Register expected procedure variable type of procedure(PChar,Word,var WideString,LongInt);Register cwstring.pp(756,29) Error: Incompatible types: got address of procedure(PWideChar,var AnsiString,LongInt);Register expected procedure variable type of procedure(PUnicodeChar,var RawByteString,Word,LongInt);Register cwstring.pp(757,29) Error: Incompatible types: got address of procedure(PChar,var WideString,LongInt);Register expected procedure variable type of procedure(PChar,Word,var UnicodeString,LongInt);Register cwstring.pp(780) Fatal: There were 4 errors compiling module, stopping Fatal: Compilation aborted make[2]: *** [cwstring.ppu] Fehler 1 make[2]: Leaving directory `/home/mse/packs/standard/svn/fp/cpstrnew/rtl/linux' make[1]: *** [linux_all] Fehler 2 make[1]: Leaving directory `/home/mse/packs/standard/svn/fp/cpstrnew/rtl' make: *** [rtl] Fehler 2 Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Martin Schreiber wrote: On Wednesday 11 November 2009 15:11:07 Florian Klaempfl wrote: What rtl did you use? You need one from the branch. Compiling the cpstrnew rtl with fixes_2_4 does not work: 1. Build compiler executable with 2.2.4 / 2.4.0 2. Build RTL with the new executable 3. Rebuild the compiler with the new executable and built RTL Best regards, Paul Ishenin. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Paul Ishenin schreef: Martin Schreiber wrote: On Wednesday 11 November 2009 15:11:07 Florian Klaempfl wrote: What rtl did you use? You need one from the branch. Compiling the cpstrnew rtl with fixes_2_4 does not work: 1. Build compiler executable with 2.2.4 / 2.4.0 2. Build RTL with the new executable 3. Rebuild the compiler with the new executable and built RTL That can be accomplished with the following command in the compiler dir: make cycle PP=/path/to/ppc386-2.2.4 Vincent ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Vincent Snijders schreef: Paul Ishenin schreef: Martin Schreiber wrote: On Wednesday 11 November 2009 15:11:07 Florian Klaempfl wrote: What rtl did you use? You need one from the branch. Compiling the cpstrnew rtl with fixes_2_4 does not work: 1. Build compiler executable with 2.2.4 / 2.4.0 2. Build RTL with the new executable 3. Rebuild the compiler with the new executable and built RTL That can be accomplished with the following command in the compiler dir: make cycle PP=/path/to/ppc386-2.2.4 Sorry, that may not be true. It assumes that the new RTL can be build with fpc 2.2.4. Vincent ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Wed, November 11, 2009 14:10, Jonas Maebe wrote: On 11 Nov 2009, at 13:49, Graeme Geldenhuys wrote: Is FPC realistically going to implement all 2059 of those? We might implement 1 or 2 of those. Most of them will however be handled by libiconv, the Windows code page conversion APIs, or some other external library (just like with the current widestring manager). Nevertheless: is e.g. ISO 8859-2 character set referenced the same way under different platforms (in the new concept), or would the new codepage number contain different values depending on the host platform? Does libiconv allow referencing the character sets using some numeric identifier at all? If yes, where are these identifiers defined? As an example, MS Windows addresses ISO 8859-2 as codepage number 28592 whereas OS/2 uses codepage number 912. Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Paul Ishenin schreef: Martin Schreiber wrote: On Wednesday 11 November 2009 15:11:07 Florian Klaempfl wrote: What rtl did you use? You need one from the branch. Compiling the cpstrnew rtl with fixes_2_4 does not work: 1. Build compiler executable with 2.2.4 / 2.4.0 2. Build RTL with the new executable This fails here with the error Martin got too: Compiling ../unix/cwstring.pp cwstring.pp(731,26) Error: Incompatible types: got address of procedure(PWideChar,var AnsiString,LongInt);Register expected procedure variable type of procedure(PWideChar,var RawByteString,Word,LongInt);Register cwstring.pp(732,26) Error: Incompatible types: got address of procedure(PChar,var WideString,LongInt);Register expected procedure variable type of procedure(PChar,Word,var WideString,LongInt);Register cwstring.pp(756,29) Error: Incompatible types: got address of procedure(PWideChar,var AnsiString,LongInt);Register expected procedure variable type of procedure(PUnicodeChar,var RawByteString,Word,LongInt);Register cwstring.pp(757,29) Error: Incompatible types: got address of procedure(PChar,var WideString,LongInt);Register expected procedure variable type of procedure(PChar,Word,var UnicodeString,LongInt);Register cwstring.pp(780) Fatal: There were 4 errors compiling module, stopping Vincent ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Vincent Snijders wrote: This fails here with the error Martin got too: Compiling ../unix/cwstring.pp Not Florian nor me touched linux yet. Best regards, Paul Ishenin. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
In our previous episode, Tomas Hajny said: We might implement 1 or 2 of those. Most of them will however be handled by libiconv, the Windows code page conversion APIs, or some other external library (just like with the current widestring manager). Nevertheless: is e.g. ISO 8859-2 character set referenced the same way under different platforms (in the new concept), or would the new codepage number contain different values depending on the host platform? Does libiconv allow referencing the character sets using some numeric identifier at all? If yes, where are these identifiers defined? As an example, MS Windows addresses ISO 8859-2 as codepage number 28592 whereas OS/2 uses codepage number 912. Yes this is a problem. When I made the unicode document I thought about this too, and no solution is perfect. (using windows everywhere is strange for users, but you don't want to break Delphi per se) So I came up with a compromise (solution 3 below) There are three solutions: 1 delphi compatible, always use Windows encodings. 2 define a handful of constants that map to the encoding on the given platform. FPC_CODEPAGE_8859_2 =... 3 a mix of 1 and a handful of own constants: take a range of say 30-50 values that are free in the Windows range. Have a per platform table that maps these 50 values to native codepage numbers. The indexes into these table get nice names like in option 2. This way in the encoding translate routine you can do if (encodingfpc_encoding_low) and (encodingfpc_encoding_high) then begin encoding:=fpc_encodingtable[encoding-FPC_encoding_low]; // cheap lookup end else begin encoding:=windowsencoding2nativeencoding[encoding]; end; Delphi users would only have to define the fpc constants of (2) to their respective windows codepages to keep the code working. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Wednesday 11 November 2009 15:57:28 Paul Ishenin wrote: Martin Schreiber wrote: On Wednesday 11 November 2009 15:11:07 Florian Klaempfl wrote: What rtl did you use? You need one from the branch. Compiling the cpstrnew rtl with fixes_2_4 does not work: 1. Build compiler executable with 2.2.4 / 2.4.0 2. Build RTL with the new executable 3. Rebuild the compiler with the new executable and built RTL Got it working without cwstring, thanks Paul. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Wed, November 11, 2009 16:31, Marco van de Voort wrote: In our previous episode, Tomas Hajny said: We might implement 1 or 2 of those. Most of them will however be handled by libiconv, the Windows code page conversion APIs, or some other external library (just like with the current widestring manager). Nevertheless: is e.g. ISO 8859-2 character set referenced the same way under different platforms (in the new concept), or would the new codepage number contain different values depending on the host platform? Does libiconv allow referencing the character sets using some numeric identifier at all? If yes, where are these identifiers defined? As an example, MS Windows addresses ISO 8859-2 as codepage number 28592 whereas OS/2 uses codepage number 912. Yes this is a problem. When I made the unicode document I thought about this too, and no solution is perfect. (using windows everywhere is strange for users, but you don't want to break Delphi per se) So I came up with a compromise (solution 3 below) There are three solutions: 1 delphi compatible, always use Windows encodings. 2 define a handful of constants that map to the encoding on the given platform. FPC_CODEPAGE_8859_2 =... 3 a mix of 1 and a handful of own constants: take a range of say 30-50 values that are free in the Windows range. Have a per platform table that maps these 50 values to native codepage numbers. The indexes into these table get nice names like in option 2. This way in the encoding translate routine you can do if (encodingfpc_encoding_low) and (encodingfpc_encoding_high) then begin encoding:=fpc_encodingtable[encoding-FPC_encoding_low]; // cheap lookup end else begin encoding:=windowsencoding2nativeencoding[encoding]; end; Delphi users would only have to define the fpc constants of (2) to their respective windows codepages to keep the code working. Well... How do you make sure that MS doesn't come with an extension of the supported codepages in the next version of MS Windows (or that they don't support a different list in some special version, like a version for the Chinese market) breaking your selection of 50 free values in Windows range? How does your list of 50 values compare to 280 lines provided by (GNU) recode -l (presumably matching to high extent to values supported by the underlying libiconv library)? Isn't it necessary to also keep the character set names under Unix (as far as I understand it, at least console character set information is provided using charset name provided in an environment variable there)? Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
In our previous episode, Tomas Hajny said: begin encoding:=windowsencoding2nativeencoding[encoding]; end; Delphi users would only have to define the fpc constants of (2) to their respective windows codepages to keep the code working. Well... How do you make sure that MS doesn't come with an extension of the supported codepages in the next version of MS Windows (or that they don't support a different list in some special version, like a version for the Chinese market) breaking your selection of 50 free values in Windows range? In that unlikely case, change the range. How does your list of 50 values compare to 280 lines provided by (GNU) recode -l (presumably matching to high extent to values supported by the underlying libiconv library)? Like about 50/280. That's the point of most used. For the less likely ones, define constants to the windows codepages. Note that is all just a guestimate on the size of the free ranges. But I rather not expand that too much. Isn't it necessary to also keep the character set names under Unix (as far as I understand it, at least console character set information is provided using charset name provided in an environment variable there)? Put them in the table too, for Unix. But what is the alternative? Delphi incompability ? Everything homemade and incompatible? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On 11 Nov 09, at 20:53, Marco van de Voort wrote: In our previous episode, Tomas Hajny said: begin encoding:=windowsencoding2nativeencoding[encoding]; end; Delphi users would only have to define the fpc constants of (2) to their respective windows codepages to keep the code working. Well... How do you make sure that MS doesn't come with an extension of the supported codepages in the next version of MS Windows (or that they don't support a different list in some special version, like a version for the Chinese market) breaking your selection of 50 free values in Windows range? In that unlikely case, change the range. That raises a question whether incompatibility between two FPC versions is better than incompatibility between FPC and Delphi (caused by tight connection between Delphi and one particular platform)... How does your list of 50 values compare to 280 lines provided by (GNU) recode -l (presumably matching to high extent to values supported by the underlying libiconv library)? Like about 50/280. That's the point of most used. For the less likely ones, define constants to the windows codepages. I don't understand what you mean by define constants to the windows codepages. I guess that I'm missing something there but it seems to me that your proposal doesn't allow use of some of the character sets. If we want to depend on MS changing their platform specific use of certain constants, I can imagine that we should be able to find a gap in the windows character set numbering to cover at least all the character sets registered by IANA. However, we need to provide mapping between the MS Windows character set number and the native character set number for all character set numbers defined in Windows and supported by the particular platform, otherwise the compatibility argument doesn't hold any longer, does it? Note that is all just a guestimate on the size of the free ranges. But I rather not expand that too much. I'm pretty sure that Windows actually support fewer character sets than what is defined in IANA. Since Windows already use word values, there should be fairly large gaps. Looking at the MSDN documentation (http://msdn.microsoft.com/en-us/library/dd317756.aspx), there are 152 values defined altogether and there's currently e.g. just a single value used in the 3 range, no value in 4, nothing between 38 and 436 (probably rather unlikely to change, I'd expect changes rather in other areas), nothing between 1362 and , etc. Isn't it necessary to also keep the character set names under Unix (as far as I understand it, at least console character set information is provided using charset name provided in an environment variable there)? Put them in the table too, for Unix. From certain perspective, these text versions may be useful for all platforms (imagine HTML character set declarations). However, there's a risk that they may not be used completely consistently across all platforms (IANA definitions allow quite a few alternative versions for the character set names). BTW, the above mentioned MSDN page also refers to some string identifier supposedly used for .NET, so I suspect that these become sooner or later supported by Delphi too somehow. ;-) But what is the alternative? Delphi incompability ? Everything homemade and incompatible? We could e.g. use the MIBENUM number defined by IANA as our primary identifier, that is not homemade. But the main point is IMHO understanding how these values are used (in FPC). If they're mainly used for checking whether the string stored in memory in some character set needs to be converted before e.g. I/O operations via console then we may actually prefer using platform specific constants (i.e. different values for the same character set on different platforms) because that doesn't require any conversion (well, at least on platforms defining console codepages using numeric values). If we want/need to store these constants when storing strings to file streams and make the resulting files portable across platforms then we obviously need to use the same constants for all platforms. If we assume need for using the same stored streams in both Delphi and FPC programs then this needs to be compatible between Delphi and FPC (are they compatible in other aspects?). As you can see, I'm still not that clear on the use cases at the moment. Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On 11 Nov 2009, at 23:46, Tomas Hajny wrote: That raises a question whether incompatibility between two FPC versions is better than incompatibility between FPC and Delphi (caused by tight connection between Delphi and one particular platform)... Since they're working on a Mac OS X cross compiler, they too are going to have to address that... Someone could always try asking http://blogs.embarcadero.com/eboling/ Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On 11 Nov 2009, at 23:54, Tomas Hajny wrote: Are character sets recognized by numeric values under Mac OS X? There are two ways to deal with them. One is libiconv, like on any *nix platform (which, afaik, only supports string identifiers). The other is CFString, which uses numeric values: http://developer.apple.com/mac/library/documentation/CoreFoundation/Reference/CFStringRef/Reference/reference.html#//apple_ref/doc/constant_group/External_String_Encodings Mac OS X also has some helper routines that they (and we) may use though (especially the second and the fourth one): CFStringConvertEncodingToIANACharSetName CFStringConvertEncodingToWindowsCodepage CFStringConvertIANACharSetNameToEncoding CFStringConvertWindowsCodepageToEncoding I didn't know about them until I looked up the CFString reference. It's quite unfortunate that the new unicodestring type that Delphi added isn't opaque, since then we could just have used CFString on Mac OS X (instead of constantly creating and destroying CFStrings when we have to convert from one code page to another). Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On 11 Nov 09, at 23:50, Jonas Maebe wrote: On 11 Nov 2009, at 23:46, Tomas Hajny wrote: That raises a question whether incompatibility between two FPC versions is better than incompatibility between FPC and Delphi (caused by tight connection between Delphi and one particular platform)... Since they're working on a Mac OS X cross compiler, they too are going to have to address that... Someone could always try asking http://blogs.embarcadero.com/eboling/ Are character sets recognized by numeric values under Mac OS X? Tomasa ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On 12 Nov 09, at 0:01, Jonas Maebe wrote: On 11 Nov 2009, at 23:54, Tomas Hajny wrote: Are character sets recognized by numeric values under Mac OS X? There are two ways to deal with them. One is libiconv, like on any *nix platform (which, afaik, only supports string identifiers). The other is CFString, which uses numeric values: http://developer.apple.com/mac/library/documentation/CoreFoundation/Reference/CFStringRef/Reference/reference.html#//apple_ref/doc/constant_group/External_String_Encodings . . These Mac OS X could not be used by Delphi directly in any case because some of the values would not fit into 2 bytes (constants for all the Unicode encodings fall into this category apparently), so I guess that their choice is more or less clear, especially if Mac OS X already provides direct support for mapping between Windows codepage numbers and the internal constants out of the box... Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
In our previous episode, Tomas Hajny said: supported codepages in the next version of MS Windows (or that they don't support a different list in some special version, like a version for the Chinese market) breaking your selection of 50 free values in Windows range? In that unlikely case, change the range. That raises a question whether incompatibility between two FPC versions Incompatibility how exactly? Two different FPC versions are already not compatible. is better than incompatibility between FPC and Delphi (caused by tight connection between Delphi and one particular platform)... That would be source incompatibility, and therefore much worse. Like about 50/280. That's the point of most used. For the less likely ones, define constants to the windows codepages. I don't understand what you mean by define constants to the windows codepages. The 16-bit range is split between a short FPC range and a long Delphi/Windows range. Rarely used codepages use the windows codepage number, and if foreign OSes support that, they must implement a windows2local codepage number conversion. of certain constants, I can imagine that we should be able to find a gap in the windows character set numbering to cover at least all the character sets registered by IANA. Implementing at all only makes sense if OSes implement them exactly. Several Windows codepages might map to corresponding IANA sets. However, we need to provide mapping between the MS Windows character set number and the native character set number for all character set numbers defined in Windows and supported by the particular platform, otherwise the compatibility argument doesn't hold any longer, does it? Just like that you must be able to map the IANA sets to actually supported sets on all platforms. Note that is all just a guestimate on the size of the free ranges. But I rather not expand that too much. I'm pretty sure that Windows actually support fewer character sets than what is defined in IANA. Since Windows already use word values, there should be fairly large gaps. Looking at the MSDN documentation (http://msdn.microsoft.com/en-us/library/dd317756.aspx), there are 152 values defined altogether and there's currently e.g. just a single value used in the 3 range, no value in 4, nothing between 38 and 436 (probably rather unlikely to change, I'd expect changes rather in other areas), nothing between 1362 and , etc. If the ranges are large enough we can try to fit them in all somewhere. But this means the lesser used codepages are also in twice, blowing up lookuptables or codepages. as I understand it, at least console character set information is provided using charset name provided in an environment variable there)? Put them in the table too, for Unix. From certain perspective, these text versions may be useful for all platforms (imagine HTML character set declarations). However, there's a risk that they may not be used completely consistently across all platforms (IANA definitions allow quite a few alternative versions for the character set names). BTW, the above mentioned MSDN page also refers to some string identifier supposedly used for .NET, so I suspect that these become sooner or later supported by Delphi too somehow. ;-) I'd wait till this is entirely sure before exposing these names, and only on platforms that need them. Otherwise we find ourselves with 3 strings per codepage on all platforms before long in any library. Moreover, many OSes might already provide a way to resolve numbers to names. But what is the alternative? Delphi incompability ? Everything homemade and incompatible? We could e.g. use the MIBENUM number defined by IANA as our primary identifier, that is not homemade. But the main point is IMHO understanding how these values are used (in FPC). If they're mainly used for checking whether the string stored in memory in some character set needs to be converted before e.g. I/O operations via console then we may actually prefer using platform specific constants (i.e. different values for the same character set on different platforms) because that doesn't require any conversion (well, at least on platforms defining console codepages using numeric values). If we want/need to store these constants when storing strings to file streams and make the resulting files portable across platforms then we obviously need to use the same constants for all platforms. If we assume need for using the same stored streams in both Delphi and FPC programs then this needs to be compatible between Delphi and FPC (are they compatible in other aspects?). As you can see, I'm still not that clear on the use cases at the moment. It would greatly confuse FPC-Delphi projects for a nearly sterile benefit. The problem is not even the change itself, but actually hunting them down, ifdefing them, getting the changes
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Paul Ishenin wrote: 3. RawByteString type (an ansi string type which does not perform any codepage conversions) with compiler support. A great idea ! I would have requested this if it would not be already planned :). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Paul Ishenin wrote: 4. Most of codepage convertion methods and string type convertion methods (ansistring = unicodestring = widestring) Does this mean that the compiler already automatically calls the appropriate conversion, but the code that actually performs it is not yet implemented in the RTL ? -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Martin Schreiber wrote: Performance and simplicity. MSEgui does not need the overhead of multi-encoding/multi-charsize. At the moment msestring=UnicodeString for FPC 2.4 which is perfect. I fear FPC will drop this simple solution where it was ahead of Delphi. Hmmm. I suppose with Linux (using utf-8 for the GUI interface to the user programs), having the user program always use Widestring internally and convert any GUI input and output string will feature an enormous overhead. I feel that just handling the multi-encoding String-management record (with encoding6 and character-size, which, in the case of GUI generated and meant for GUI Strings, always is equal), would not impose much overhead over using using the plain old String management with just reference-counter and contents-pointer. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Michael Schnell wrote: Paul Ishenin wrote: 4. Most of codepage convertion methods and string type convertion methods (ansistring = unicodestring = widestring) Does this mean that the compiler already automatically calls the appropriate conversion, but the code that actually performs it is not yet implemented in the RTL ? It is already implemented in RTL (for win32 only at moment). But there are many other things which are not implemented: varios Pos, Delete, Copy, strings concatenation. Open rtl\inc\astrings.inc for more info. Best regards, Paul Ishenin. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Martin Schreiber schrieb: On Tuesday 10 November 2009 12:45:26 Michael Schnell wrote: Martin Schreiber wrote: Performance and simplicity. MSEgui does not need the overhead of multi-encoding/multi-charsize. At the moment msestring=UnicodeString for FPC 2.4 which is perfect. I fear FPC will drop this simple solution where it was ahead of Delphi. Hmmm. I suppose with Linux (using utf-8 for the GUI interface to the user programs), having the user program always use Widestring internally and convert any GUI input and output string will feature an enormous overhead. Xlib and Xft have an utf-16 interface. Windows has an utf-16 interface. MSEgui is faster than Lazarus with Gtk2 and even faster as Lazarus with Gtk1 on Linux IIRC. I feel that just handling the multi-encoding String-management record (with encoding6 and character-size, which, in the case of GUI generated and meant for GUI Strings, always is equal), would not impose much overhead over using using the plain old String management with just reference-counter and contents-pointer. Did you look into the code in cpstrnew branch? I did. And which changes are exactly the reason for your concerns? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Tuesday 10 November 2009 18:33:54 Florian Klaempfl wrote: Did you look into the code in cpstrnew branch? I did. And which changes are exactly the reason for your concerns? More checks and more complicated address calculation. OK, you can say you will not notice the small difference. But possible this attitude is one of the reasons that Delphi compiles much faster than FPC? ;-) Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Martin Schreiber escreveu: On Tuesday 10 November 2009 18:33:54 Florian Klaempfl wrote: Did you look into the code in cpstrnew branch? I did. And which changes are exactly the reason for your concerns? More checks and more complicated address calculation. OK, you can say you will not notice the small difference. But possible this attitude is one of the reasons that Delphi compiles much faster than FPC? ;-) Hey, aren't you that is saying to not follow Delphi? Luiz ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Martin Schreiber schrieb: On Tuesday 10 November 2009 18:33:54 Florian Klaempfl wrote: Did you look into the code in cpstrnew branch? I did. And which changes are exactly the reason for your concerns? More checks Where? A pure unicodestring routine won't get additional checks. and more complicated address calculation. Where? Adding 16 instead of 12 makes no difference. The major difference will be the initialization of the additional fields and the increased memory requirement of 4 bytes per string. OK, you can say you will not notice the small difference. But possible this attitude is one of the reasons that Delphi compiles much faster than FPC? ;-) Yes, and if compilation speed is important for you, you should use delphi ;) What you call this attitude is also the reason why FPC can exist: we need always to find a compromise between: - satisfy the needs of as much as possible users - maintainability - portability - performance ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Tue, 10 Nov 2009, Florian Klaempfl wrote: Martin Schreiber schrieb: On Tuesday 10 November 2009 18:33:54 Florian Klaempfl wrote: Did you look into the code in cpstrnew branch? I did. And which changes are exactly the reason for your concerns? More checks Where? A pure unicodestring routine won't get additional checks. and more complicated address calculation. Where? Adding 16 instead of 12 makes no difference. The major difference will be the initialization of the additional fields and the increased memory requirement of 4 bytes per string. OK, you can say you will not notice the small difference. But possible this attitude is one of the reasons that Delphi compiles much faster than FPC? ;-) Yes, and if compilation speed is important for you, you should use delphi ;) What you call this attitude is also the reason why FPC can exist: we need always to find a compromise between: - satisfy the needs of as much as possible users - maintainability - portability - performance Especially portability is what makes the difference. It is all about available time. If we only had to support one target processor and one OS, we could spend *much* more time on optimisation of the generated code. But as it is, we have to divide our attention between i386, X86_64, powerPC (32/64 bit), sparc and ARM. That's 6 CPUs and at least as much operating systems. Given the time limits, the first thing to go out of the window is therefor speed. MSE boasts some 'great features'. Well, last time I checked (admittedly some time ago), MSE didn't even support 64-bits, while Lazarus does since many years. What to spend time on is a decision each makes for himself: Martin chose features over platforms. We choose platforms over optimization... Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Tuesday 10 November 2009 19:08:45 Florian Klaempfl wrote: Where? A pure unicodestring routine won't get additional checks. What is a pure unicodestring routine? and more complicated address calculation. Where? Adding 16 instead of 12 makes no difference. The major difference will be the initialization of the additional fields and the increased memory requirement of 4 bytes per string. OK, so you say that the processing of the new and the current UnicodeString and therefore the RTL and compiler procedures are identical with exception of the initialization of 4 bytes with a constant? Now that is exciting. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Martin Schreiber schrieb: On Tuesday 10 November 2009 19:08:45 Florian Klaempfl wrote: Where? A pure unicodestring routine won't get additional checks. What is a pure unicodestring routine? No other string types being involved especially things like RawByteString. and more complicated address calculation. Where? Adding 16 instead of 12 makes no difference. The major difference will be the initialization of the additional fields and the increased memory requirement of 4 bytes per string. OK, so you say that the processing of the new and the current UnicodeString and therefore the RTL and compiler procedures are identical with exception of the initialization of 4 bytes with a constant? Well, two times two byte ;) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Tuesday 10 November 2009 20:20:06 Florian Klaempfl wrote: OK, so you say that the processing of the new and the current UnicodeString and therefore the RTL and compiler procedures are identical with exception of the initialization of 4 bytes with a constant? Well, two times two byte ;) Already a possibility for optimization? 1 * 4 bytes is faster than 2 * 2bytes AFAIK? ;-) Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
In our previous episode, Michael Schnell said: multi-encoding/multi-charsize. At the moment msestring=UnicodeString for FPC 2.4 which is perfect. I fear FPC will drop this simple solution where it was ahead of Delphi. Hmmm. I suppose with Linux (using utf-8 for the GUI interface to the user programs), having the user program always use Widestring internally and convert any GUI input and output string will feature an enormous overhead. The real GUI part is not the real performance dependant part. GTK2 slow performance comes from little optimizations and portability layering, much less design choices. The only problem is the db-aware part and the few other widgets that can have 10 elements and more, like treeview. There mass conversion can hurt, e.g. when loading the data into the widget. That is probably solvable (caching, only translating at the moment of viewing etc). But in general I think codepage conversions are a different magnitude from general processing. I feel that just handling the multi-encoding String-management record (with encoding6 and character-size, which, in the case of GUI generated and meant for GUI Strings, always is equal), would not impose much overhead over using using the plain old String management with just reference-counter and contents-pointer. Neither do I. While I think it would be best to use native encoding on all platforms as much as possible, that is an opinion. However not using native encoding for general processing is nuts. So we need the UTF8 type anyway. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Marco van de Voort wrote: Neither do I. While I think it would be best to use native encoding on all platforms as much as possible, that is an opinion. However not using native encoding for general processing is nuts. So we need the UTF8 type anyway. Just to make a small point: the choice for UTF16 was made because of market reasons, not technical ones. Chinese Korean, Japanese markets are important for Delphi. Try to figure out how to fit that in UTF8. Just a thought. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
In our previous episode, thaddy said: Marco van de Voort wrote: While I think it would be best to use native encoding on all platforms as much as possible, that is an opinion. However not using native encoding for general processing is nuts. So we need the UTF8 type anyway. Just to make a small point: the choice for UTF16 was made because of market reasons, not technical ones. Chinese Korean, Japanese markets are important for Delphi. Try to figure out how to fit that in UTF8. Just a thought. I don't think that is the reason. Codegear/Borland simply followed Microsoft. Period. It might be a reason for Microsoft though, but that was made way way long ago, pre 1994-5. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Tuesday 10 November 2009 21:38:33 Marco van de Voort wrote: The only problem is the db-aware part and the few other widgets that can have 10 elements and more, like treeview. There mass conversion can hurt, e.g. when loading the data into the widget. That's the reason why MSEgui stores DB strings as UnicodeString in tmsebufdatset and the other DB access components. Converting utf-8 - utf-16 - utf-8 once while receiving/sending the data over the wire is fast compared to the data transmission, you will not notice the small difference. ;-) The MSEgui list and tree components store stringdata as UnicodeString of course. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Freepascal 2.4.0rc1 released
Marco van de Voort wrote: We have placed the first release-candidate of the Free Pascal Compiler version 2.4.0 on our ftp-servers. Thanks for the note. I wanted to update the Git mirror by tagging the 2.4.0rc1 revision, but looking in SubVersion, that was done two weeks ago? Is this announcements simply 2 weeks late? release_2_4_0_rc1/ 13941 2 weeks marco* regenned makefiles Also, why the what's new all the way back to FPC v1.9.0 in your announcement? v1.9.0 is very old!! Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Freepascal 2.4.0rc1 released
On Mon, November 9, 2009 13:57, Graeme Geldenhuys wrote: Marco van de Voort wrote: We have placed the first release-candidate of the Free Pascal Compiler version 2.4.0 on our ftp-servers. Thanks for the note. I wanted to update the Git mirror by tagging the 2.4.0rc1 revision, but looking in SubVersion, that was done two weeks ago? Is this announcements simply 2 weeks late? release_2_4_0_rc1/ 13941 2 weeks marco* regenned makefiles No, it just took some time until we managed to build the release for the various supported platforms. Tomas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Freepascal 2.4.0rc1 released
In our previous episode, Graeme Geldenhuys said: Marco van de Voort wrote: We have placed the first release-candidate of the Free Pascal Compiler version 2.4.0 on our ftp-servers. Thanks for the note. I wanted to update the Git mirror by tagging the 2.4.0rc1 revision, but looking in SubVersion, that was done two weeks ago? Is this announcements simply 2 weeks late? No. It is exactly one day the last platform was updated, IOW the two weeks were for building. release_2_4_0_rc1/ 13941 2 weeks marco* regenned makefiles Also, why the what's new all the way back to FPC v1.9.0 in your announcement? v1.9.0 is very old!! Thanks, I just copied the whatsnew.txt, but should have cut the history. Will think about it next time ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Freepascal 2.4.0rc1 released
Graeme Geldenhuys wrote: Marco van de Voort wrote: We have placed the first release-candidate of the Free Pascal Compiler version 2.4.0 on our ftp-servers. Thanks for the note. I wanted to update the Git mirror by tagging the 2.4.0rc1 revision, but looking in SubVersion, that was done two weeks ago? Is this announcements simply 2 weeks late? release_2_4_0_rc1/ 13941 2 weeks marco* regenned makefiles Sorry, I'm slightly confused as to how you guys managed the version process in FPC. Above is the 'tag' created in SubVersion but last changes was 2 weeks ago. In Git, I am tracking 'branches/fixes_2_4' which was update last in r14066 (3 days ago), so should I tag this revision as 2.4.0rc1? Or should I tag r13941 (from /tags/release_2_4_0_rc1) in Git mirror? Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Freepascal 2.4.0rc1 released
In our previous episode, Graeme Geldenhuys said: Thanks for the note. I wanted to update the Git mirror by tagging the 2.4.0rc1 revision, but looking in SubVersion, that was done two weeks ago? Is this announcements simply 2 weeks late? release_2_4_0_rc1/ 13941 2 weeks marco* regenned makefiles Sorry, I'm slightly confused as to how you guys managed the version process in FPC. Above is the 'tag' created in SubVersion but last changes was 2 weeks ago. In Git, I am tracking 'branches/fixes_2_4' which was update last in r14066 (3 days ago), so should I tag this revision as 2.4.0rc1? Or should I tag r13941 (from /tags/release_2_4_0_rc1) in Git mirror? The tag is the 2.4.0RC1. The final release will probably branch from fixes_2_4 at the RC1 branchpoint, and then have some critical patches merged back form fixes_2_4 After 2.4.0 is out, fixes_2_4 branch will be updated to versionnumber 2.4.1 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Freepascal 2.4.0rc1 released
How are the plans for D2009 Strings (or some other explicit (ANSISTRING UTF8STRING) or runtime-automatic Unicode handling going ? -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Freepascal 2.4.0rc1 released
Michael Schnell пишет: How are the plans for D2009 Strings (or some other explicit (ANSISTRING UTF8STRING) or runtime-automatic Unicode handling going ? checkout cpstrnew branch and help the development ;) Best regards, Paul Ishenin. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Freepascal 2.4.0rc1 released
Paul Ishenin wrote: checkout cpstrnew branch and help the development ;) OK. :) Is a final decision done on what will be implemented (d2009 Strings or whatever) ? -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Freepascal 2.4.0rc1 released
Is this page http://wiki.freepascal.org/FPC_Unicode_support (saying Upcoming Delphi release codenamed Tiburon will natively support Unicode) still valid ? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Michael Schnell wrote: Is this page http://wiki.freepascal.org/FPC_Unicode_support (saying Upcoming Delphi release codenamed Tiburon will natively support Unicode) still valid ? At least the branch link is valid :) Yesterday I asked Florian to permit me to help him with the development. So at moment I don't know everything about the branch but already have some info. Branch is trying to implement similar to d2009 codepage string support. What is already done there: 1. string type internal representaion has changed. It now has elementsize and codepage fields. 2. new string type declaration syntax although it differs from delphi. FPC syntax is: Cp1251StringType = string1251 and delphi do this so Cp1251StringType = type AnsiString(1251) 3. RawByteString type (an ansi string type which does not perform any codepage conversions) with compiler support. Although this support is a bit differ from delphi. For example: delphi does not prefere RawByteString to other ansi string types for overload functions. It is imposible to pass ansi string ( RawByteString) to functions by reference (using var or out) while in FPC it is possible. Although, delphi allow this for some internal functions (not fair). 4. Most of codepage convertion methods and string type convertion methods (ansistring = unicodestring = widestring) But RTL is still needs a lot of work and review. Functions which has AnsiString arguments will convert all your codepage strings to the default system encoding. So where is needed this must be replaced by RawByteString type (for WriteLn, ReadLn, etc). Today I commited a few fixes to that branch. I also have few non-commited changes which make codepage string concatenation and WriteLn work but they made the compiler compilation fail and thus need more attention :) Best regards, Paul Ishenin. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
In our previous episode, Martin Schreiber said: http://wiki.freepascal.org/FPC_Unicode_support (saying Upcoming Delphi release codenamed Tiburon will natively support Unicode) still valid ? At least the branch link is valid :) [...] Will there be a simple reference counted WideString on all platforms as the current FPC UnicodeString is? Why does there need to be a widestring type if there is unicodestring? Is there some reason why unicodestring wouldn't work for you? IIRC you already have a MSESTring alias that you use everywhere, so it should be a matter of just changing what the alias points to depending on compiler version? WideString could be reference counted on all platforms and the current not reference counted Windows WideString could be renamed to OLEString in order This would break a lot of Delphi and COM code, and is IMHO not smart. If for some reason the old widestring implemention needs to continue (it is redundant if unicodestring is there?), it should get a different identifier. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Mon, Nov 9, 2009 at 6:30 PM, Martin Schreiber fp...@bluewin.ch wrote: WideString could be reference counted on all platforms and the current not reference counted Windows WideString could be renamed to OLEString in order applications which don't need different encodings in strings don't suffer from the performance and memory requirement penalty of the complicated multi-encoding-multi-char-size string. you can use the following: uses mytypes; // use the unit in your project {$ifdef MSWindows} type OleString = WideString; WideString = UnicodeString; {$endif} thanks, dmitry ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
On Monday 09 November 2009 16:34:52 Marco van de Voort wrote: In our previous episode, Martin Schreiber said: Will there be a simple reference counted WideString on all platforms as the current FPC UnicodeString is? Why does there need to be a widestring type if there is unicodestring? Is there some reason why unicodestring wouldn't work for you? IIRC you already have a MSESTring alias that you use everywhere, so it should be a matter of just changing what the alias points to depending on compiler version? Performance and simplicity. MSEgui does not need the overhead of multi-encoding/multi-charsize. At the moment msestring=UnicodeString for FPC 2.4 which is perfect. I fear FPC will drop this simple solution where it was ahead of Delphi. WideString could be reference counted on all platforms and the current not reference counted Windows WideString could be renamed to OLEString in order This would break a lot of Delphi and COM code, and is IMHO not smart. If for some reason the old widestring implemention needs to continue (it is redundant if unicodestring is there?), it should get a different identifier. No problem with me. What will be the new name? Although I find it strange that WideString is reference counted on Linux and the same WideString type is not reference counted on Windows and for porting COM code one could define WideString = OLEString or OLEString = WideString to be Delphi compatible. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
In our previous episode, Martin Schreiber said: This would break a lot of Delphi and COM code, and is IMHO not smart. If for some reason the old widestring implemention needs to continue (it is redundant if unicodestring is there?), it should get a different identifier. No problem with me. What will be the new name? I don't know if there will be a kylix widestring then. It is not my decision. Although I find it strange that WideString is reference counted on Linux and the same WideString type is not reference counted on Windows and for porting COM code one could define WideString = OLEString or OLEString = WideString to be Delphi compatible. The Delphi version was first. Since people also used it for non COM purposes, Borland added it to Kylix. Kylix is dead, which makes the ref counted widestring extinct in the Borland line. Delphi COM usage gets the widestring identifier (IMHO) because that is what it was originally conceived for. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Freepascal 2.4.0rc1 released
Marco van de Voort escreveu: We have placed the first release-candidate of the Free Pascal Compiler version 2.4.0 on our ftp-servers. You can help improve the upcoming 2.4.0 release by downloading and testing this release. If you want you can report what you have done here: http://wiki.freepascal.org/Testers_2.4.0 Hi, I'm the maintainer of TSqlite3Dataset and after 2.4 branch was created i fixed two bugs in Calculated/Lookup fields support. Is there time to merge these fixes or is too late? Luiz ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel