Re: [fpc-devel] HTML string to TFPColor
How about this. To me it is more readable. type THtmlColorName = ( *hcnUnknown*, hcnWhite, hcnSilver, hcnGray, hcnBlack, hcnRed, hcnMaroon, hcnYellow, hcnOlive, hcnLime, hcnGreen, hcnAqua, hcnTeal, hcnBlue, hcnNavy, hcnFuchsia, hcnPurple); function TryStrToHtmlColorName(const S: String; out AName: THtmlColorName): Boolean; begin * Result := True;** * case LowerCase(S) of 'white' : AName := hcnWhite; 'silver' : AName := hcnSilver; 'gray' : AName := hcnGray; 'black' : AName := hcnBlack; 'red': AName := hcnRed; 'maroon' : AName := hcnMaroon; 'yellow' : AName := hcnYellow; 'olive' : AName := hcnOlive; 'lime' : AName := hcnLime; 'green' : AName := hcnGreen; 'aqua' : AName := hcnAqua; 'teal' : AName := hcnTeal; 'blue' : AName := hcnBlue; 'navy' : AName := hcnNavy; 'fuchsia': AName := hcnFuchsia; 'purple' : AName := hcnPurple; * else** **AName := hcnUnknown;** **Result := False;** * end; end; On 2017-07-23 16:46, Bart wrote: On 7/23/17, Bartwrote: Hopefully less eye-sorrow ... resourcestring SInvalidHtmlColor = '"%s" is not a valid Html color'; type THtmlColorName = ( hcnWhite, hcnSilver, hcnGray, hcnBlack, hcnRed, hcnMaroon, hcnYellow, hcnOlive, hcnLime, hcnGreen, hcnAqua, hcnTeal, hcnBlue, hcnNavy, hcnFuchsia, hcnPurple); const HtmlColorNameToFPColorMap: array[THtmlColorName] of TFPColor = ( (red: $ff; green: $ff; blue: $ff; alpha: alphaOpaque), //hcnWhite (red: $c0; green: $c0; blue: $c0; alpha: alphaOpaque), //hcnSilver (red: $80; green: $80; blue: $80; alpha: alphaOpaque), //hcnGray (red: $00; green: $00; blue: $00; alpha: alphaOpaque), //hcnBlack (red: $ff; green: $00; blue: $00; alpha: alphaOpaque), //hcnRed (red: $80; green: $00; blue: $00; alpha: alphaOpaque), //hcnMaroon (red: $ff; green: $ff; blue: $00; alpha: alphaOpaque), //hcnYellow (red: $80; green: $80; blue: $00; alpha: alphaOpaque), //hcnOlive (red: $00; green: $ff; blue: $00; alpha: alphaOpaque), //hcnLime (red: $00; green: $80; blue: $00; alpha: alphaOpaque), //hcnGreen (red: $00; green: $ff; blue: $ff; alpha: alphaOpaque), //hcnAqua (red: $00; green: $80; blue: $80; alpha: alphaOpaque), //hcnTeal (red: $00; green: $00; blue: $ff; alpha: alphaOpaque), //hcnBlue (red: $00; green: $00; blue: $80; alpha: alphaOpaque), //hcnNavy (red: $ff; green: $00; blue: $ff; alpha: alphaOpaque), //hcnFuchsia (red: $80; green: $00; blue: $80; alpha: alphaOpaque) //hcnPurple ); function TryStrToHtmlColorName(const S: String; out AName: THtmlColorName): Boolean; begin Result := False; case LowerCase(S) of 'white' : begin Result := True; AName := hcnWhite; end; 'silver' : begin Result := True; AName := hcnSilver; end; 'gray' : begin Result := True; AName := hcnGray; end; 'black' : begin Result := True; AName := hcnBlack; end; 'red': begin Result := True; AName := hcnRed; end; 'maroon' : begin Result := True; AName := hcnMaroon; end; 'yellow' : begin Result := True; AName := hcnYellow; end; 'olive' : begin Result := True; AName := hcnOlive; end; 'lime' : begin Result := True; AName := hcnLime; end; 'green' : begin Result := True; AName := hcnGreen; end; 'aqua' : begin Result := True; AName := hcnAqua; end; 'teal' : begin Result := True; AName := hcnTeal; end; 'blue' : begin Result := True; AName := hcnBlue; end; 'navy' : begin Result := True; AName := hcnNavy; end; 'fuchsia': begin Result := True; AName := hcnFuchsia; end; 'purple' : begin Result := True; AName := hcnPurple; end; end; end; { Try to translate HTML color code into TFPColor Supports following formats '#rgb' '#rrggbb' W3C Html color name } function TryHtmlToFPColor(const S: String; out FPColor: TFPColor): Boolean; function TryHexStrToWord(const Hex: String; out W: Word): Boolean; var Code: Integer; begin Val('$'+Hex, W, Code); Result := (Code = 0); if not Result then W := 0; end; var AName: THtmlColorName; begin Result := False; FPColor.red := 0; FPColor.green := 0; FPColor.blue := 0; FPColor.alpha := alphaOpaque; if (Length(S) = 0) then Exit; if (S[1] = '#') then begin if Length(S) = 4 then begin // #rgb Result := (TryHexstrToWord(S[2]+S[2], FPColor.red) and TryHexstrToWord(S[3]+S[3], FPColor.green) and TryHexstrToWord(S[4]+S[4], FPColor.blue)); end else if Length(S) = 7 then begin // #rrggbb Result := (TryHexstrToWord(S[2]+S[3], FPColor.red) and TryHexstrToWord(S[4]+S[5], FPColor.green) and TryHexstrToWord(S[6]+S[7], FPColor.blue)); end; end else begin Result := TryStrToHtmlColorName(S, AName); if Result then
Re: [fpc-devel] Math expressions in wiki
Apparently, the FPC wiki uses the same software as Wikipedia; if so, the following links may be useful. https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Mathematics#Using_HTML https://en.wikipedia.org/wiki/Wikipedia:Mathematical_symbols https://en.wikipedia.org/wiki/List_of_mathematical_symbols On 2017-04-14 13:25, Werner Pamler wrote: Does anybody know how to write mathematical expressions in the wiki? I would like to write an article on fpc's NumLib, but I would only want to begin this activity when I know how to enter complex mathematical formulas like integrals etc such that they are displayed decently. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
On 2014-10-27 00:00, Sven Barth wrote: On 26.10.2014 12:17, Kostas Michalopoulos wrote: On Sun, Oct 26, 2014 at 8:32 AM, Sven Barth pascaldra...@googlemail.com mailto:pascaldra...@googlemail.com wrote: Definitely not. We are in Pascal and there such directives are placed afterwards. how about these: 1) 'record' --- '*packed* *record*' 2) AVariable: Integer *absolute**AnotherVariable*. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
On 2014-10-27 09:39, Michael Van Canneyt wrote: On Mon, 27 Oct 2014, ListMember wrote: On 2014-10-27 00:00, Sven Barth wrote: On 26.10.2014 12:17, Kostas Michalopoulos wrote: On Sun, Oct 26, 2014 at 8:32 AM, Sven Barth pascaldra...@googlemail.com mailto:pascaldra...@googlemail.com wrote: Definitely not. We are in Pascal and there such directives are placed afterwards. how about these: 1) 'record' --- 'packed record' 2) AVariable: Integer absolute AnotherVariable. Weak is a modifier, just as static, cdecl, external etc. In Pascal, modifiers are placed after the thing they modify. 'absolute' modifies AVariable. That just confirms the above rule. To me, 'weak' is modifying the type rather than the variable itself (much like 'packed record'). So, it would make more sense to me if we wrote it like FVariable: weak TSometype; ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Community site
On 2014-02-10 00:02, Florian Klämpfl wrote: The lazarus forum can be reached at http://forum.lazarus.freepascal.org/index.php?action=forum Shouldn't this link go directly to forum? http://forum.lazarus.freepascal.org/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Pass compiler options to generics [was Delphi anonymous methods]
On 2013-03-06 13:13, Alexander Klenin wrote: On Wed, Mar 6, 2013 at 10:00 PM, Michael Schnell mschn...@lumino.de wrote: On 03/05/2013 05:17 PM, Alexander Klenin wrote: 1) Make sure is and as work with generic types -- maybe they already are? is the generic type and/or is a certain specialization ? Yes, and also generic parameters. procedure TGenegicTProc; begin if T is String then ... end; And, from then on, we're going to have to typecast each and every occurence of T.. Even if this was possible for every type of T, I wonder how doing that affects the performance of generics. That's why I like the idea of passing compiler directives to generics to help eliminate parts of code (from the generic implementation) during compile-time. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] Pass compiler options to generics [was Delphi anonymous methods]
On 2013-03-05 12:37, Sven Barth wrote: Thanks, I try my best :) I know you do. And, since generics has also been mentioned in this thread, here is something I'd like to table/mention (or, rather, use you as a sounding board, if I may). Sometimes writing generic routines that are truly generic is hard since the types it needs to handle can be tough to genericise (strings, ordinals, objects etc. are not always treatable in generic routines in the same way). The solution to this might be to write different generic routines for each offending type but this means duplication plenty of code in the process which in turn defeates quite a bit of the purpose of having generics. This brings me to wonder if it would be possible to pass some constant (or set of constants, or something similar) to generic routine such that this/these option(s) would be treated as compiles options within the implementation of the generic routine. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Delphi anonymous methods
On 2013-03-02 21:55, Sven Barth wrote: - (in context with the above point) couple the support for the anonymous functions to a new modeswitch (enabled by default in mode delphi), but allow reference to procvars in non-Delphi modes (this way you'd only be able to use nested functions, but later on the shorter syntax should be used there as well) This paragraph went way over my head. But, it sort of reminded me to ask about something else: Are anonymous methods (going to be) compatible with event methods? i.e. are they 'procedure of object' (or 'function of object') kind of constructs. The reason I am asking this is: There have been times when I could kill to be able to assign (temporarily) a local procedure/function an object's event (say, OnClick) so that I could capture the event's outcome right in there wthin the body of the active routine. Would this be possible? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Forwarded message about FPC status
On 2012-12-22 11:48, Michael Van Canneyt wrote: On Sat, 22 Dec 2012, ListMember wrote: On 2012-12-22 00:27, Sven Barth wrote: Am 21.12.2012 22:20 schrieb ListMember listmem...@letterboxes.org: Can you (or someone else, of course) think of a better search string to locate it? Go to View Issues, click on the + before the search bix, click in the appearing entries in the top left for reporter and select the user Inoussa OUEDRAOGO in the list (strangely the user exists twice, I used the first one) and click on Apply Filter. The second entry should be the correct one (you should be able to judge this from the issue's description). Thank you for that detailed navigation; I got it now. [ http://mantis.freepascal.org/view.php?id=22909 ] Does anyone know if the license issue has been discussed in any public maillist/wiki etc. Reason I am asking is this: Having read (now and several times in the past) unicode.org's license [ http://www.unicode.org/copyright.html#Exhibit1 ] I simply cannot see what it is that is so (or, rather, at all) restrictive. It would require every FPC made program to include the unicode license. By itself maybe not a problem, but this contrasts with the fact that for years, you could make an FPC program without any additional licenses, if you didn't use any third-party libraries. Inclusion in the RTL would make this an obligation for every FPC program. However, last status/opinion is that this is only so if you were to copy the files verbatim. If the data contained in the files is somehow recoded, then it would probably not apply. We didn't get any answers to our inquiries. But we found that Delphi also uses these files, and they put forward the above argument on the Delphi forums when Paul Ishenin inquired. It boils down to: Only the form is copyrighted, not the actual data. We hope they are right, otherwise every Delphi program as of Delphi 2009 is in violation of the unicode license :-) Note that I am not a lawyer, the above are therefor not rigorous legal truths. I am not a lawyer either, but I did notice that they were quite pedantic (or, a better word might be meticulous) with their wording: In the license text they state that Data Files *or* Software must contain their license text. Unicode.org guys are as much coders as linguists, so I believe they have used '*/or/*' (as opposed to '/*and*/' or '/*and/or*/') for a reason. So, as an addition to what you have said, my take is that including their copyright in the data alone will suffice --programs/software need not have to bear the same text. [Plus, of course, there should be a clear statement that it is 'modified'. And, the documentation should bear license text. Wiki should do.] ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Forwarded message about FPC status
On 2012-12-21 14:26, Michael Van Canneyt wrote: - Inoussa has made a native unicode string manager. A large effort. Is this code publicly available somewhere? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Forwarded message about FPC status
On 2012-12-21 22:29, Michael Van Canneyt wrote: On Fri, 21 Dec 2012, ListMember wrote: On 2012-12-21 14:26, Michael Van Canneyt wrote: - Inoussa has made a native unicode string manager. A large effort. Is this code publicly available somewhere? It's attached to a bugreport in Mantis somewhere. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel I did a search for 'Inoussa' and I got these 4 items --all closed. Neither one contains any significantly sized attachment. cpnewstr's charray_to_ansistr conversion is not ok (due to codepage parameter) http://mantis.freepascal.org/view.php?id=17754 Access Violation when assigning Self to local interface Reference in constructor http://mantis.freepascal.org/view.php?id=16901 Apache Bindings - apr.pas http://mantis.freepascal.org/view.php?id=11460 Memory leak with interfaces http://mantis.freepascal.org/view.php?id=7281 Can you (or someone else, of course) think of a better search string to locate it? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Forwarded message about FPC status
On 2012-12-22 00:27, Sven Barth wrote: Am 21.12.2012 22:20 schrieb ListMember listmem...@letterboxes.org mailto:listmem...@letterboxes.org: Can you (or someone else, of course) think of a better search string to locate it? Go to View Issues, click on the + before the search bix, click in the appearing entries in the top left for reporter and select the user Inoussa OUEDRAOGO in the list (strangely the user exists twice, I used the first one) and click on Apply Filter. The second entry should be the correct one (you should be able to judge this from the issue's description). Thank you for that detailed navigation; I got it now. [ http://mantis.freepascal.org/view.php?id=22909 ] Does anyone know if the license issue has been discussed in any public maillist/wiki etc. Reason I am asking is this: Having read (now and several times in the past) unicode.org's license [ http://www.unicode.org/copyright.html#Exhibit1 ] I simply cannot see what it is that is so (or, rather, at all) restrictive. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] Tor/Onion
___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Tor/Onion
I am sorry. I didn't mean to send it here. But, now that I did: Does anyone know of Tor/Onion protocol implementation in Pascal. Cheers On 2009-09-23 13:26, listmember wrote: ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] RTTI and Attributes in Delphi 2010
On 2009-08-17 01:20, Graeme Geldenhuys wrote: An exact example would be very helpful. And the old property mapping to database field doesn't could, because that's a design preference and such mappings are not always appropriate or possible. What I have in mind isn't quite an example but I'd like to voice a need I felt a many times over the years and attributes seem to fulfill it. Some of these attributes would include 'help', 'copyright' and things like that. I would use the attributes to display help for any property in the property editor in design time. I.e. instead of having to read a help file, by pressing a button next to that property I'd see a popup explaining it. The information provided in these 'help's could also be extracted to generate a help file too. Same thing for 'copyright', of course, and similar to 'help' it would apply to the class as well as properties and code. I'd like these to be keywords and similar XML (or like Pascal's 'begin..end') they would also have a closing tag/keyword; so that, for example, 'help_begin' would have 'help_end' as its closing tag/keyword. Other attributes could be references to external systems such as databases. i.e. field mappings to databases. While I would like these to be both designtime and runtime, with appropriate compiler swithces we could strip them out of final executable. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
On 2008-11-23 10:19, Mattias Gaertner wrote: On Sat, 22 Nov 2008 23:05:43 +0200 listmember[EMAIL PROTECTED] wrote: Is there a way to determine how much memory is consumed by strings by a running application? I'd like to know this, in particular, for FPC ana Lazarus --to begin with. And, the reason I'd like to know this is this: Whenever I suggest that char size be increased to 4, the idea gets opposed on the grouds that it will need huge memory --4 times as much. There's of course some merit in that arguement, but I have no idea what it is '4 times' of. This is not very engineer-like --it being unmeasured. Can anyone suggest a way to measure the memory load caused by strings? The exact amount depends on the application, but think about loading text files of 100mb into strings. This will need at least the 100mb plus the overhead for each string (at least 12 bytes). With 2 byte chars an extra of 100mb would be needed and with 4 byte chars 300mb additional mem would be needed. For example the lazarus IDE typically holds 50 to 200mb sources in memory. If this would be changed to unicodestring (2 byte per char) then the IDE would need 50 to 200mb more memory. And because many time consuming tasks are already bound by the memory bandwidth of current computers, the IDE would become twice as slow. Do the math for 4 byte per char. What I had in mind wasn't to store the string data in UTF-32 (or UCS-4); it would still be UTF-8 or whatever. I am only considering in memory representation being UTF-32 (or UCS-4). This way, loading from and saving to would hardly be affected, yet in-memory operations would be a lot faster and more simplified. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
I am only considering in memory representation being UTF-32 (or UCS-4). What do you mean with 'memory representation'? That, each char in a string in memory would be 4-bytes (or more); yet, when saved on disk (or transmitted across the net etc.) it would be UTF-8 compressed. IOW, no compression applied to in-memory strings. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
Actually, load times are not --does not seem to be-- linear at all. 4 times larger file seems to take only twice as long. I did one very simple test using 2 text files: File 1: 384 MB (403,248,710 bytes) File 2: 120 MB (126,680,448 bytes) with the code below: procedure TForm1.Button1Click(Sender: TObject); var InitialValue1: Int64; //Initial PerformanceCounter Divisor1: Int64; //Performance CounterFrequency CurrentValue1: Int64; //Current PerformanceCounter Time1: double; Time2: double; Stream1: TMemoryStream; Index1: integer; begin Memo1.Lines.Clear; QueryPerformanceFrequency(Divisor1); Index1 := 0; while Index1 100 do begin QueryPerformanceFrequency(CurrentValue1); QueryPerformanceCounter(InitialValue1); Stream1 := TMemoryStream.Create; Stream1.LoadFromFile(FILE_1); Stream1.Free; QueryPerformanceCounter(CurrentValue1); Time1 := (CurrentValue1 - InitialValue1) / Divisor1; QueryPerformanceCounter(InitialValue1); Stream1 := TMemoryStream.Create; Stream1.LoadFromFile(FILE_2); Stream1.Free; QueryPerformanceCounter(CurrentValue1); Time2 := (CurrentValue1 - InitialValue1) / Divisor1; Memo1.Lines.Add(Format('[400 MB: %3.3ns] [100 MB: %3.3ns]', [Time1, Time2])); Inc(Index1); end; end; Output: [400 MB: 0.514s] [100 MB: 0.241s] [400 MB: 0.535s] [100 MB: 0.239s] [400 MB: 0.532s] [100 MB: 0.252s] [400 MB: 0.532s] [100 MB: 0.245s] [400 MB: 0.541s] [100 MB: 0.240s] [400 MB: 0.533s] [100 MB: 0.240s] [400 MB: 0.540s] [100 MB: 0.240s] [400 MB: 0.532s] [100 MB: 0.245s] [400 MB: 0.532s] [100 MB: 0.234s] [400 MB: 0.538s] [100 MB: 0.240s] [400 MB: 0.531s] [100 MB: 0.241s] [400 MB: 0.533s] [100 MB: 0.242s] [400 MB: 0.531s] [100 MB: 0.242s] [400 MB: 0.585s] [100 MB: 0.252s] [400 MB: 0.531s] [100 MB: 0.243s] [400 MB: 0.531s] [100 MB: 0.289s] [400 MB: 0.569s] [100 MB: 0.240s] [400 MB: 0.532s] [100 MB: 0.235s] [400 MB: 0.535s] [100 MB: 0.241s] [400 MB: 0.533s] [100 MB: 0.242s] [400 MB: 0.532s] [100 MB: 0.239s] [400 MB: 0.531s] [100 MB: 0.241s] [400 MB: 0.532s] [100 MB: 0.239s] [400 MB: 0.532s] [100 MB: 0.245s] [400 MB: 0.536s] [100 MB: 0.239s] [400 MB: 0.534s] [100 MB: 0.256s] [400 MB: 0.547s] [100 MB: 0.242s] [400 MB: 0.535s] [100 MB: 0.261s] [400 MB: 0.530s] [100 MB: 0.232s] [400 MB: 0.541s] [100 MB: 0.239s] [400 MB: 0.533s] [100 MB: 0.243s] [400 MB: 0.535s] [100 MB: 0.244s] [400 MB: 0.530s] [100 MB: 0.231s] [400 MB: 0.540s] [100 MB: 0.240s] [400 MB: 0.582s] [100 MB: 0.330s] [400 MB: 0.557s] [100 MB: 0.231s] [400 MB: 0.539s] [100 MB: 0.240s] [400 MB: 0.531s] [100 MB: 0.230s] [400 MB: 0.539s] [100 MB: 0.243s] [400 MB: 0.531s] [100 MB: 0.246s] [400 MB: 0.535s] [100 MB: 0.240s] [400 MB: 0.532s] [100 MB: 0.279s] [400 MB: 0.609s] [100 MB: 0.241s] [400 MB: 0.533s] [100 MB: 0.249s] [400 MB: 0.537s] [100 MB: 0.239s] [400 MB: 0.531s] [100 MB: 0.242s] [400 MB: 0.530s] [100 MB: 0.240s] [400 MB: 0.535s] [100 MB: 0.238s] [400 MB: 0.532s] [100 MB: 0.241s] [400 MB: 0.536s] [100 MB: 0.242s] [400 MB: 0.532s] [100 MB: 0.240s] [400 MB: 0.534s] [100 MB: 0.230s] [400 MB: 0.545s] [100 MB: 0.235s] [400 MB: 0.538s] [100 MB: 0.240s] [400 MB: 0.531s] [100 MB: 0.235s] [400 MB: 0.536s] [100 MB: 0.229s] [400 MB: 0.540s] [100 MB: 0.232s] [400 MB: 0.540s] [100 MB: 0.243s] [400 MB: 0.539s] [100 MB: 0.234s] [400 MB: 0.540s] [100 MB: 0.230s] [400 MB: 0.539s] [100 MB: 0.261s] [400 MB: 0.535s] [100 MB: 0.242s] [400 MB: 0.529s] [100 MB: 0.234s] [400 MB: 0.538s] [100 MB: 0.234s] [400 MB: 0.538s] [100 MB: 0.244s] [400 MB: 0.535s] [100 MB: 0.242s] [400 MB: 0.529s] [100 MB: 0.239s] [400 MB: 0.532s] [100 MB: 0.251s] [400 MB: 0.631s] [100 MB: 0.236s] [400 MB: 0.535s] [100 MB: 0.242s] [400 MB: 0.531s] [100 MB: 0.243s] [400 MB: 0.531s] [100 MB: 0.239s] [400 MB: 0.531s] [100 MB: 0.232s] [400 MB: 0.543s] [100 MB: 0.239s] [400 MB: 0.528s] [100 MB: 0.232s] [400 MB: 0.538s] [100 MB: 0.242s] [400 MB: 0.537s] [100 MB: 0.233s] [400 MB: 0.537s] [100 MB: 0.241s] [400 MB: 0.533s] [100 MB: 0.230s] [400 MB: 0.543s] [100 MB: 0.242s] [400 MB: 0.533s] [100 MB: 0.240s] [400 MB: 0.531s] [100 MB: 0.253s] [400 MB: 0.537s] [100 MB: 0.243s] [400 MB: 0.547s] [100 MB: 0.238s] [400 MB: 0.539s] [100 MB: 0.233s] [400 MB: 0.545s] [100 MB: 0.257s] [400 MB: 0.572s] [100 MB: 0.318s] [400 MB: 0.563s] [100 MB: 0.238s] [400 MB: 0.536s] [100 MB: 0.241s] [400 MB: 0.533s] [100 MB: 0.249s] [400 MB: 0.531s] [100 MB: 0.242s] [400 MB: 0.534s] [100 MB: 0.241s] [400 MB: 0.532s] [100 MB: 0.238s] [400 MB: 0.537s] [100 MB: 0.241s] [400 MB: 0.616s] [100 MB: 0.253s] [400 MB: 0.536s] [100 MB: 0.228s] [400 MB: 0.540s] [100 MB: 0.244s] [400 MB: 0.539s] [100 MB: 0.237s] [400 MB: 0.536s] [100 MB: 0.241s] [400 MB: 0.539s] [100 MB: 0.236s] ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
On 2008-11-23 13:07, Graeme Geldenhuys wrote: On Sun, Nov 23, 2008 at 12:29 PM, listmember[EMAIL PROTECTED] wrote: What I am curious about is: 4 times of what? RAM, Ramdom Access Memory, DIMMs those little green sticks you shove into the motherboard. :-) :) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
However, you may hack into RTL at the NewAnsiString / NewWideString / NewUnicodeString procedures and install hooks that will record the number of bytes requested. That shouldn't be too difficult to do. This is what I was looking for. Thank you. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
Do a 'find declaration' on an identifier, that does not exist. This will explore all units of the uses section. Now I see what you mean. But, isn't this a design-choice; caching all sources in memory for speed reasons, as opposed to on-demand opening and closing each file. Still. If that is how it works, it is how it works. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
On 2008-11-23 13:49, Jonas Maebe wrote: On 23 Nov 2008, at 12:35, listmember wrote: But, isn't this a design-choice; caching all sources in memory for speed reasons, as opposed to on-demand opening and closing each file. For very large projects, that should probably be done anyway at some point. But even in that case, using a more memory-efficient string type enables you to keep more data in memory and hence potentially obtain better performance. The last time I joined a relevant discussion, I was told worrying about native UCS-4 string-type would be pointless simply because that sort of thing is really needed for word processors only. Now, I have been informed that Lazarus (and perhaps other IDEs) use upwards of 50 MB string space just to do one of their basic operations. That leaves me wondering how much do we lose performance-wise in endlessly decompressing UTF-8 data, instead of using, say, UCS-4 strings. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
On 2008-11-23 14:10, Daniël Mantione wrote: Therefore, any other encoding is a waste of memory and does not gain you any speed. For that reason, I don't see the compiler switch from 8-bit processing either. I nearly fully agree with you. Except that, when a string constant needs to contain non-ASCI chars. What do we do in these cases? Only if you need to process characters (rather than pass them on), UTF-32 is a lot faster and simpler. Yes. If I knew how to write this patch, I'd be working on it right now. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
I thought my example described just that. If strings use 4 bytes per char then ASCII text will need 4 times more memory. I am not disputing that. What I am curious about is: 4 times of what? It is not hard to tell that an app that works with text files (such as Lazarus) will consume 4 times more memory per file loaded. But, how much memory does, say, Lazarus --itself-- consume specifically for string storage when run for the first time? This is what I am after. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
On 2008-11-23 12:50, Jonas Maebe wrote: On 23 Nov 2008, at 11:29, listmember wrote: It is not hard to tell that an app that works with text files (such as Lazarus) will consume 4 times more memory per file loaded. But, how much memory does, say, Lazarus --itself-- consume specifically for string storage when run for the first time? From Matias' original answer: For example the lazarus IDE typically holds 50 to 200mb sources in memory. I.e., at least 4 times 50 to 200mb. This is my thick-day. So, permit me to ask this: Are you really saying that strings occupy 50 MB Lazarus's memory footprint? I just checked (using Process Explorer, under Windows) and this is what I see: Working set: 2,216 K Peak Working set: 26,988 K I can't see where that 50 MB fits into that. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
On 2008-11-23 14:34, Mattias Gaertner wrote: On Sun, 23 Nov 2008 14:11:50 +0200 listmember[EMAIL PROTECTED] wrote: That leaves me wondering how much do we lose performance-wise in endlessly decompressing UTF-8 data, instead of using, say, UCS-4 strings. I'm wondering what you mean with 'endlessly decompressing UTF-8 data'. I am referring to going to the nth character in a string. With UTF-8 it is no more a simple arithmetic and an index operation. You have to start from zero and iterate until you get to your characters --at every step, calculating whether it is 2, 3 or 4 bytes long. Doing this is decompression. You have to make a compromise between memory, ease of use and compatibility. There is no solution without drawbacks. If you want to process large 8bit text files then UTF-8 is better. If you want to paint glyphs then normalized UTF-32 is better. If you want some unicode with some mem overhead and some easy usage and have compiler support for some compatibility then UTF-16 is better. Do we have to think in terms of encodings (which are, ways of compressing text) when what we actually mean 1-byte, 2-byte and 4-byte per char strings. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
On 2008-11-23 14:19, Mattias Gaertner wrote: On Sun, 23 Nov 2008 13:35:07 +0200 listmember[EMAIL PROTECTED] wrote: [...] These dependencies are complex and require exclusive access. The memory belongs to the program, the source files can be changed by anyone. Therefore the files are kept in memory and auto reloaded if they change on disk. Makes sense. Thank you for explaining it. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
On 2008-11-23 14:49, Daniël Mantione wrote: Op Sun, 23 Nov 2008, schreef Jonas Maebe: On 23 Nov 2008, at 13:31, Daniël Mantione wrote: For an IDE, this is a little bit more complicated. I.e. searching for a ç in a source file needs to find both the composed and the decomposed variant, and in the case of UTF-8, this character can be encoded in 1, 2, 3 or 4 bytes which all need to be found. This is where UTF-16 and UTF-32 start to make sense. Characters can also be decomposed in UTF-16 and in UTF-32 (for the same reasons as in UTF-8). I am aware of that, but the combining cedille is not in the easy to process range of UTF-8. In other words, you cannot do if char[i]=combining_cedille in UTF-8. Instead UTF-8, you need to make sure the string has enough characters left, and then compare multiple characters. Heck, you even need to take care of the fact the the combining cedille can be encoded in 2, 3 or 4 bytes. This is one of the million and one small details that one has to keep in mind while programming. What I think would more sensible is that, instead of using all these variable sizes and all, simply use 4-byte/char strings and compose (in UTF sense) everything into that string. You do this once, when importing/loading text to your app. And, then on, everthing is just like the good old string --except that it is a 4-byte per char string, instead of 1-byte. Now, my question is this: How would I create a 'FourByteString' type, reference counted etc. just like the usual 'String'? How hard is it? Can someone like me, who does nor speak assembler, do it? If so, where do I begin copypasting from 'string'? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
On 2008-11-23 15:10, Marco van de Voort wrote: In our previous episode, listmember said: [].. I'd like to know this, in particular, for FPC ana Lazarus --to begin with. And, the reason I'd like to know this is this: Whenever I suggest that char size be increased to 4, the idea gets opposed on the grouds that it will need huge memory --4 times as much. That's not the only reason: - more memory also means slower copy. True. But, being multiples of 4-bytes, may compenmsate for it. Don't quote me on this though. - Most OSes seem to use uTF-8 and UTF-16, with -32 you would an island, and the avg text editors might not be able to read what you write The answer to that is this: 1) When inputting/outputting text to/from file or the OS, you use UTF-8 (or whatever is native/required). 2) You do not make UTF-32 mandatory. But, it should be there for those (and those cases) that need it. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Memory consumed by strings
On 2008-11-23 19:31, Graeme Geldenhuys wrote: At least the good thing of UTF-8 is that you don't have to worry about LE or BE byte orders. UTF-16 and UTF-32 have that nasty issue. LE/BE only applies when streaming to/from file/device/network, otherwise life is much simpler with UTF-32. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] Memory consumed by strings
Is there a way to determine how much memory is consumed by strings by a running application? I'd like to know this, in particular, for FPC ana Lazarus --to begin with. And, the reason I'd like to know this is this: Whenever I suggest that char size be increased to 4, the idea gets opposed on the grouds that it will need huge memory --4 times as much. There's of course some merit in that arguement, but I have no idea what it is '4 times' of. This is not very engineer-like --it being unmeasured. Can anyone suggest a way to measure the memory load caused by strings? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode support - for the 20th time... ;-)
Ok, two questions for the example above: - how do you maintain backward compatibility? - how do you load a plain old ansi file? You could alter the LoadFromFile(), LoadFromStream(), SaveToFile(), SaveToStrwam() routines like below: procedure TStringList.LoadFromFile(AFileName: TFilename; const ACharSet: TCharSetKind = csSomeDefaultCharset); This way, code using these would compile and work as usual. Of course, descendants would need to be modified --a one time modification. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] FPC SVN Error and Wow!
I was trying to get a copy of the whole FPC SVN, but at some point, I get the following error: Server sent unexpected return value (413 Request Entity Too Large) in response to REPORT request for '/svn/fpc/!svn/vcc/default' Does anyone have any idea why I get this 413 Request Entity Too Large message from the server? And the Wow! thing is this: From what I so far downloaded, my side of the copy is: 4.95 GB (5,315,804,192 bytes) 634,779 files 115,577 folders Wow! ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] FPC SVN Error and Wow!
On 2008-11-07 17:40, Felipe Monteiro de Carvalho wrote: What svn command line are you using? I am using the latest stable TortoiseSVN (Windows). Maybe you are trying to download everything, including all tags, all branches, all binary files, etc, etc. Yes. I deliberately want to do that. Usually you only want to download either trunk or the latest fixes branch. True. But, I want to get a full backup. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] FPC SVN Error and Wow!
I mean what command are you using. Something like: svn co blablabla Since this is a point-n-click GUI thing, there's no command line that I know of. But the URL I am using is this: http://svn.freepascal.org/svn/fpc ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] FPC SVN Error and Wow!
On 2008-11-07 19:12, Aleksa Todorovic wrote: So, which action from right-click menu have you chosen? SVN Checkout? First 'SVN Checkout'; then --after each failure-- 'SVN Update' ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] FPC SVN Error and Wow!
And the error is quite clear: you're pushing too much data to the server. I don't have 'write' access. Not have I done any alterations to copy I have so far managed to get. What data could I be pushing to the server? Try cutting up in several smaller chunks: Browse the server repository, and request only the branches that you need. Though I am not clear why this may make a difference, I'll try updating 'trunk', 'tags' and 'branches' separately. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] FPC SVN Error and Wow!
What data could I be pushing to the server? The state of your current copy of the repository; From this the server can determine what data it should send and what not. If there are too many files, this will amount to a big request, and that could cause such an error response. H.. That makes sense but I am not sure (how) it applies in my case. Reason: I get this error quite consistently. I mean, even when I first issued 'SVN Checkout' when there was nothing on my side to worry about difs and what not. Seeing that it downloaded something like 4 GByte of data and then (and, only then) stopped due to server error, it implies (to me at least) that there is something wrong on the server side. Anyway, thanks to you pointing out likelihood of me affecting the server somehow, I did some googling and it seems this is not such an uncommon issue. Apparently, it has to do with Apache settings. IOW, 413 Request Entity Too Large problem can be solved by moving SSLVerifyClient from the directory level up to the virtual host level as found in Apache bug 39154, See: http://svn.haxx.se/users/archive-2008-01/0689.shtml or http://www.svnforum.org/2017/viewtopic.php?t=3159 and, for the actual Apache bug report and workaround: https://issues.apache.org/bugzilla/show_bug.cgi?id=39154 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
DM Example: In Dutch uppercase characters generally do not get tremas: Daniël becomes DANIEL. Should an uppercase routine worry? No, this is a spelling convention, the correct uppercase of ë is Ë, we should not confuse spelling with uppercasing. No. This is not a spelling convention. It is a rule dictated by the language the word is written in. If the word Daniël is Dutch, then its uppercase is: UpperCase(Daniël, langDutch) -- DANIEL Fine. Yet, if we dont know what lang it is written, then the uppercase is: UpperCase(Daniël, langUndefined) -- DANIËL Now.. as I don't know Dutch at all, I wonder what the LowerCase transforms would be for the same uppercased word, DANIEL LowerCase(DANIEL, langDutch) -- daniel or, LowerCase(DANIEL, langDutch) -- daniël or both? If both, how do you pick the correct one? Example also, in spanish sólo is different than SOLO and meaning is different ( alone only ). Yes, it is impretative that we know the language of the word is in, so that UpperCase(sólo, langSpanish) -- SÓLO UpperCase(solo, langSpanish) -- SOLO Otherwise, we may end up altering the meaning of the text. UpperCase(), LowerCase() should not alter the meaning of the text. This would be a crime in any other context. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] assign constant text to widestring
On 2008-10-24 02:46, Felipe Monteiro de Carvalho wrote: I agree with Daniël on this one. Simplify. ë -- Ë always If you need something which takes into consideration the language then build another routine with more parameters. It's not that simple. How would you uppercase this piece of string In Dutch uppercase characters generally do not get tremas: Daniël becomes DANIEL. correctly unless you knew the substring Daniël is in Dutch and while the rest is in English ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Martin Friebe wrote: Just to make sure, all of this discussion is based on various collation No part of this discussion is based on collation. I am going to leave out the object question for now. I said all I can say in earlier mails. That's good. Thank you. And also from your comments it appears more a question of collation being stored with the string, substring, or even each char. Martin, are you doing this on purpose? I mean, are you intentionaly driving me up the wall? Seriously. Can't you forget/drop this 'collation' word?! And, then, think a little deeper. Here is a scenario for you: You have multilanguage text as data. Someone has asked you to search it and see if a certain peice of string (in a given language) exists in it. This search needs to be NOT case-sensitive. How can you do this? Is it doable if TCharacter (or wahtever you call it) has no 'langauge' attribite? [Note that, here 'TCharacter' isn't necessarily an object; it might as well be a simple record structure.] As found in the last mail, there is currently no standard for handling cross-collation in any string function (that is string function, which could be collation based). 1) IMHO only few people would need this. For the majority it would be unwanted overhead. 2) Within those few, there would be too many different Expectation as to what the standard should be. If FPC choose one such standard at will, it would benefit almost no one. You're still stuck with that wretched word 'collation'. The best FPC could to is provide storage, for something that is not handled or obeyed in any function handling the data. This doesn't sound desirable to me. If anyone who needs it will have to implement the functions, then those may add there own storage for it too. Besides instead of storing it per char, you can use unused unicode as start/stop markers. So it can be implemented on top of a string that stores unicode-chars (and chars only, no attributes) Is there, in Unicode, start-stop markes that denote 'language'? All the others are not an intrinsic part of o a char at all --they vary by context. Why is language intrinsic to the text? An A is an A in any language. At best language is intrinsic to sorting/comparing(case on non case-sense) text Comparing is a lot more important an operation than collating --or, rather, collation is achieveable only if you can do proper comparisons. Take this, for example: if SameText(SomeString, SomeOtherString) then do ... For this to work properly, in both 'SomeString' and 'SomeOtherString', you need to know which language *each* character belongs to. If you dont have that informtaion, you might as well not have a SameText() function in FPC. Please note the 'case-INsensitive' keyword there. Well I needed an actual example where case sense differs by language (assuming we talk about language using the same charset (not comparing Chinese whit English). Here is a simple example for you: if SameText('I am on FoolStrasse', 'I am on FoolStraße') then do ... Now.. how are you going to decide that SameText() function here returns true unless you have information that the substring 'FoolStraße' is in German? I know that this is a very simple example --that 'ß' exists only in German, and that you could infer that when you met that char. But, this hightlights the problem --and there are times when you cannot infer. In any case, I can write up several different algorithms how to do that. Please do. SameText(), for one, will need all the help it can get. What I can not do (or what I do not want to do) is to decide which of them other people do want to use. But, isn't this just that: IOW, you're deciding what other people will NOT want to use if you throw the 'language' attribute (for each char) out of the window.. Or, if this is not what you think of, please clarify by example.. Here is another typical example: SameText('Istanbul', 'istanbul') can only return true when both 'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani. Otherwise, the same SameText() has to return false. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Actually for you example case doesn't matter. as you need to decide if ss = ß And, this is only valid in German. For all other, the result must either be false, or undefined. Is there, in Unicode, start-stop markes that denote 'language'? I do not know, that was why I said unused unicode and implemented on top (as part of the specific app) As far as I know, there isn't a language delimiter in Unicode. IMHO The discussion splits here between: 1) How can this be done in a specific app 2) what should fpc provide as for 2: This would be on top of yet (afaik) missing basic functions such as Compare using collation x (where collation is given as argument to compare, not as part of any string) I think we're beginning to be on the same page --but, please, can you refrain from using the word 'collation'; every time I see that in this context, I feel a strong need to open the window and shout collation isn't the most important/used part of a language wrt programming :) Take this, for example: if SameText(SomeString, SomeOtherString) then do ... For this to work properly, in both 'SomeString' and 'SomeOtherString', you need to know which language *each* character belongs to. I would rather say: There are special cases where you need/want to know which language Yes. And, if we're on our way to make FPC unicode-enabled, we need to take these special cases into account. Otherwise, we will likely end up with a half baked 'solution'. So I do not imply how special or none special those cases are = you do not always need to know. (continued below on your example) Why would I need to ALWAYS need it. Isn't 'needed when necessary' good enough? 2) actual compare, you need to normalize all strings before comparing, then compare the normalized string as bytes. normalizing means for each char to decide how to represent it. German ae could be represented as a umlaut for the compare. Or (in German text) you expand all umlaute first. IOW, SameText() and similar stuff must take normalization into account. But, you do know that 'normalization' is a very rough assumption and land you in some very embarassing situations. Here is 2 words from Turkish. 1) 'sıkıcı' which means 'boring' in English (notice the dotless small 'i's) 2) 'sikici' which means 'fucker' in English Now, when you normalize these you get 'SIKICI' for both which --then-- you would assume to be the same. Well.. I'd like to see you (or your boss) when you've come up will all those 'fucker's instead of all those 'boring' old farts you were lookin for :P [You might probably think of a German --or some othe language-- example] IOW, what I am trying to tell you is that normalization isn't really useful --it is, IMO, a stopgap solution along the path of Unicode evolution. BUT of course there is no way do deal with the ambitious Busstop In deed. For this case, you need to know what language Busstop was written in. What I can not do (or what I do not want to do) is to decide which of them other people do want to use. But, isn't this just that: IOW, you're deciding what other people will NOT want to use if you throw the 'language' attribute (for each char) out of the window.. True, I am happy to do that. NOT I am glad we have met :) Why you can always extend this. Store you string in any of the following ways 1) every 2nd char is a language attribute, not a char 2) store the language attributes in a 2nd string, always pass both strings around Of course, these and even more creative hacks could be devised. The question is, is the language an attribute of a unicode character? SameText('Istanbul', 'istanbul') can only return true when both 'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani. ok thats what I did not know. But still in most cases it will be fine to do SameText('Istanbul', 'istanbul', lGerman) SameText('Istanbul', 'istanbul', lTurkish) decide at the time of comparing Well, the prototype I had in mind was: SameText('Istanbul', 'istanbul', lGerman, lTurkish) weher the defaults for the latter 2 parameters would be lUnknown --this way, people who needen't be bothered about these would not even notice. If however the info was stored on the string (or char) what if one was Turkish, the other German ? SameText('Istanbul', 'istanbul', lTurkish, lGerman) This one must return FALSE since, in Turkish, uppercased dotted small 'i' is DOTTED capital 'i' (i.e. 'İ'). and, SameText('Istanbul', 'istanbul', lTurkish, lGerman) will return TRUE since uppercasing both sides result in the same string. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
[Note that, here 'TCharacter' isn't necessarily an object; it might as well be a simple record structure.] AFAIK for most programmers this is not a common task. Most programs need less (one language or codepage) But, when you're talking unicode, codepage is rather meaningless --isn't it? or more (phonetic, semantic, statistical search). Can you explain, why you think that this particular problem requires compiler magic? See my other reply to Martin Friebe, in another sub thread. Is there, in Unicode, start-stop markes that denote 'language'? Is it needed? Are the any unicode characters, that upper/lower depend on language? Yes. See my other reply to Martin Friebe, in another sub thread. Take this, for example: if SameText(SomeString, SomeOtherString) then do ... For this to work properly, in both 'SomeString' and 'SomeOtherString', you need to know which language *each* character belongs to. Comparing texts can be done with various meanings. For example: byte comparison, simple case insensitive comparison, not literal comparison, compare like this library, Which one do you mean? Byte comparison isn't what I am worried about. In every language, there a pretty known and fixed (by now) rules that apply to string comparison. I am referring to those rules. [...] Here is a simple example for you: if SameText('I am on FoolStrasse', 'I am on FoolStraße') then do ... Now.. how are you going to decide that SameText() function here returns true unless you have information that the substring 'FoolStraße' is in German? The two strings have the same language, but are written with different Rechtschreibung. You need dictionaries and spelling systems to implement such comparisons. This is beyond a compiler or a RTL. Are you sure. I was under the impression that Unicode covers these --without needing further data. What about loan words? For all practical purposes, 'loan words' belong to the language they are used in. Except the case where we'd be discussing etymology. SameText('Istanbul', 'istanbul') can only return true when both 'Istanbul' and 'istanbul' are *not* in Turkish/Azerbeijani. Otherwise, the same SameText() has to return false. I doubt that it is that easy. Well.. I never said that it would be that easy. But, if strip off the language attribute from the caharcater, it will be impossible --or several orders of magnitude harder for those people who need it. You can, of course, ignore all that. But, then, what is the point of going unicode? We were just fine doing things ANSI-centric.. Weren't we? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Sorry, but I meant comparing with collation. I did not mean comapring within labguage context. How can you do /proper/ collation while ignoring the language context? 1) 'sıkıcı' which means 'boring' in English (notice the dotless small 'i's) 2) 'sikici' which means 'fucker' in English Depends how you normalize. Normalize should sbstitute all *equal* letters (or combination thereof) into one single form. That allows comparing and matching them. Again, we're not quite on the same page here... What you're referring is more like 'Text Normalization' [ http://en.wikipedia.org/wiki/Text_normalization ] where you do definitely need a very comprehensive dictionary so that '1' is equal to 'one' and '1st' is 'first', etc. (if your language is English). Whereas, what I am referring to is 'Unicode Normalization' [ http://en.wikipedia.org/wiki/Unicode_normalization ]. This one is much narrower in scope. It deals basically with what I can refer to as 'character glyphs'. Now, from what I understand from the definitions of 'Unicode Normalization' there are 2 ways of doing it: 1) You decompose both texts (so that you have all 'weird' characters ezpanded into their combining characters) 2) You compose both texts (so that, you have as few or no combining characters) This is done, obviously, to get them both in the same format --to make life easier to compare. If you do no other operation on these two texts before you compare them, this is called Canonical Equivalnece Test --each 'character glyph' in each text must be the same. For Canonical Equivalnece Test, you do not need to have any 'language' attribute --afer all, you're doing a simple byte-wise test. On the other hand, if you wish to do a broader comparison, Compatibility Equivalnece Test or something other, you will need to do a little more work on those texts: Normalization is one of them. I suggest you take a look at the 'Normalization' heading under http://en.wikipedia.org/wiki/Unicode_normalization Trouble with the 'Normalization' described there is, it is far too crude for quite a lot of purposes. A better form of comparison is, converting both text to either uppercase or to lowercase. And, once we do this, we hit two walls (or obstacles) to overcome. The steps I can think of are: 1) Equivalent code points. We need first to 'compose' the text and then substitute the relevant (and preferred) equivalent code points for any 'character glyph's in the texts. 2) We also need to take care of stuff like language dependent case transforms. See http://en.wikipedia.org/wiki/Turkish_dotted_and_dotless_I As far as I know, this is the only 'proper' thing to do for search and comparison operations under unicode. I know it will be slower, but, that is the price to pay. Note: The reason I used the term 'character glyphs' is because, several codepoint can be combined to make a 'character glyph'. See the definition of Code Point [ http://unicode.org/glossary/ ] which says: Code Point: Any value in the Unicode codespace; that is, the range of integers from 0 to 1016. As an example, from the above Wiki article, we can use 2 code points to produce a 'character glyph', such as 'n' + '~' -- ñ But yes, even this is very limited (busstop), because even if you know the language of the wort (german in my example) you do not know its meaning. You do not worry about the meaning at all. In all languages (I guess) there are several words that may be written the same but mean different things. Without a full dictionary, you do not know if ss and german-sharp-s are the same or not. True. But, if you do know it is in German, then you definitely know they are. And, this makes a lot of difference. So basically what you want to do, can only be done with a full dictionary. Or you have to accept false positives. Nope. No false positives in text level. You can always, of course, get false positives in semantic level --such as when you're looking for 'apple' (the fruit) and 'Apple' (the brand name), but that's a completely different problem. I also fail to see why a utf8 string is a half baked solution. It will serve most people fine. It can be extended for those who want more. I have nothing against UFT-8 or any other encoding schemes. It is just that --en encoding scheme. Most handy as a means of transport data from one medium/app to another. But, UFT-8 does in no way cover the whole of Unicode or is a complete solution for dealing with unicode. It is, after all, an encoding scheme. BUT of course there is no way do deal with the ambitious Busstop Not even if you knew that Busstop was a german string? In deed. For this case, you need to know what language Busstop was written in. you need a dictionary. knowing it is German is not enough. because all that it is german tells you is, that ss maybe a sharp-s, but doesn't have to be A dictionary, then, wouldn't help you either because
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Actually, UTF-8 can contain bidi info, it's indeed a matter of the renderer. And, how do you propose doing a case-insensitive search in a given text that contains multiple languages? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
IMHO You can't? But you could use a TStringList. I don't think I could. Because, in TStringList, you have no way of knowing what language each item belogs to. You could, of course, work around it by adding a fake object to each item denoting the language, but that does mean a generalized solution. I also do not know of other apps that could do this. (And it may not be possible). Look around. Databses for example, AFAIK the most you can do is define a collation per column. True. But, that does not mean that those app/databases are well thought out. Does it? And how would you sort the following example, with mixed collation. Take the various german collations. ae can be used as a substitution for a-umlaut. This is actulaly an arbitary decision --there is no agreed standard on this, that I am aware-- so, each developer can have their own way. How would you sort data where one source is of one collation, the other source of another (or even worse the collation changes halfway through)? It is impossible by definition. No. It is not impossible. But, yes, there is no definition (standard). It would be upto the developer or the entity that the developer is working in. I even thing that collation is not part of the string. it does not change the meaning of the string. It is only used in specific operations. And then it must be one collation for both strings. So if each of the string had a collation that would cause an issue. But, my question is --imho-- a lot more relevant to the thread at hand: How would you do case-insensitive search in a multilangual text. [this has nothing to do with rendering or GUI.] ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
So maybe the design is quite well thought? Adding a flag field is easy enough --if all you're doing is to do some sort of collation. In that sense, everything is well tought out. But.. Life becomes very complicated when you begin to do things like FTS (full text search) on a multilanguage text in a DB engine. Your options, in this case, is just very limited: -- Ignore the langage issue. or -- store each language in a different field (that is if you know how many there will be). Do you think this is a good solution --or, a hack. As for Storing info per string or per char. (Info could be anything: collation, color, style, font, source-of-quote, author, creation-date, file, ) everyone would like there own. So again FPC shouldn't do it. Or everyone gets all the overhead of what all the others wanted. Collation is a function of language. All the others are not an intrinsic part of o a char at all --they vary by context. Also FPC is a programming language. Not a word processing tool Well, they should have remembered that before they added in char and string types when everything could perfectly be represented with a byte. Then instead of asking for strings as object, I would ask for an additional ref-counted object type (with auto destruction). The string library could be based on this. I am not asking for suxch a think because a) it wouldn't be pascal anymore. b) beware of the mem-leaks Personally, I gave up on strings as objects on the compiler level. That could, of course be added as a lib. If pascal doesn't suit the need of a specific task, choose a different tool. Instead of inventing a new pascal. Thank you for the advice. But, instead of jailing this discussion to at best a laterally relevant issue of collation, can I ask you to think for a moment: How on earth can you do a case-INsensitive search in *any* given string contains multiple language substrings? Please note the 'case-INsensitive' keyword there. Btw in normal math you can not devide a number by zero... Of course you can define your own math And, the point is??.. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Yes, but most proposals here about a TCharacter are a bit overkill. In example languare reference for a given char is not very important from a Unicode point of view, unicode focuses its power in the text, so locale is important in context operations and collations. See my other post above. Locale should really have nothing to do with the text/string business. Instead, it should only refer to oddities such as decimal number representations, thousands separators, date and time strings etc. Packing the language into the 'locale' info is an abuse IMO, unless it refers to such things as what kind of help file it should display to the user or the actual strings on menu items (resources) etc. From my point of view the compiler basic types must keep being basic, so be fast, no more than needed memory eaters and so on. Please don't get resented, but this kind of attitued is verging on being offensive.. Instead of looking at the issue from POV of I don't need it or It requires more hardware resources, can't you try to evaluate the need on its own merit. And, if you still think that you will never need it, please remember that you dont have to --but others may. Bring Unicode power to the basic string type is overkill, any Unicode operation will be in the better case double time consumer, and some of them 40-50 times slower. A simple collation will take at least 4 times the memory needed by the string itself and for most sort algorithms needs the collation is unnecesary. So? What if it is a fact of life? Such as 24-bit graphics. We all know it takes a lot more resources and that only patsies need that much color; we ended up using it. Cn't you consider this unicode caharacter in the same light? (no pun). So think in a new user filling a TStringList with 1000 strings and invoking the Sort method, as the strings are Unicode they must be ordered using the locale collation or the general collation and finally saying 20 seconds to sort 1000 strings, this looks even worst than javascript. No. This is where you are mistaken, I' afraid. A TUnicodeStringList can contain strings from different collations and one 'locale' information will be useless in sorting out that mess. You need 'language' information in each of those strings to be able to properly sort that unicode list. Maybe, again from my point of view, it is more logical to create TTextUnicodeChar and TTextUnicodeString classes which handle Unicode textual data, not Unicode data. I can't see how you can do that. I can't see how we can cater for unicode data (not textual data, as you put it) in aything other than a specific class [or data type] PS: As one of the problems of Unicode support is the big amount of data that must be stored (in exe or external file) is there any recommended way to code, that unused arrays are left out when the function that uses that array is never been called in the main program ? Storage is a completely different problem. You could use, say, UTF-8 encoding and store also the language information when necessary. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Michael Van Canneyt wrote: You are mixing 2 things: - Texts (strings) at the compiler language level. - (complex) GUI design that needs to handle a lot of text and a lot of extra properties. :) If you draw the lines so red and thick, who am I to disagree... But, I could write a gigantic data mining application, a database application or a myriad of such apps that uses the above class without doing a single pixel of GUI stuff. For GUI design, you may well need all the things you describe. And as I said before: you can do this yourself if you need it. True. I could also do my own TList, TStringList etc. etc. but, luckily I don't have to. I was under the impression, therefore, that stuff that makes life easier for a number of developers get to be included into the main distribution for common use; and not be rejected on the basis of /language level/ . ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
But, I could write a gigantic data mining application, a database application or a myriad of such apps that uses the above class without doing a single pixel of GUI stuff. I'd like to see that: it will be guaranteed dog slow :( Hmm.. may be, maybe not. Last year I wrote a natural lang parser (Pascal) and gave the source to a Java developer of friend mine. It turned out to be faster in Java --classes and all. For some reason, using the same algorithm (my code converted to Java, basically), Java beat my natively compiled code. And, no there was no GUI involved. Basing my arguement upon this world-shattering anectodal evidence, I hereby prove my point. So, there :P However, changing the object pascal language, so it requires the use of objects whenever you use strings: this is a different story. And that is what it was all about, after all. Ooops! I joined too late then. OK. I retract {I am said to come from a bargaining culture though I have yet to hone my skills with a carpet dealer, but I'll try my luck with compiler guys all the same} and ask, instead, to give us reference-counted 4-byte (actually, preferably 6-bytes) per cell arrays/strings. If I can have such a beast, it will be fast enough and will also cover almost all of the foreseable problems. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
compiler guys all the same} and ask, instead, to give us reference-counted 4-byte (actually, preferably 6-bytes) per cell arrays/strings. What's wrong with an dyn. array of DWord? Much like what's wrong with dynamic array of Word (as opposed to Widestring) or with dynamic array of byte (as opposed to string), really... Nothing much. But it is far more readable when there is special and reserved type for which we could have special operators and converters just like those we have for strings and widestrings. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
But it is far more readable when there is special and reserved type for which we could have special operators and converters just like those we have for strings and widestrings. Oh, I thougbt people just complained in this thread that + isn't appropriate for strings anyways ... People are, of course, entitled to their opinions. And, I -for one-- would never force them against their wills to use the '+' operator for any sort of strings. In the same breath, the fact that some of us object to '+' should not, IMO, be the basis to not have 4-byte (or 6-byte) per char strings. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicodestring branch, please test and help fixing
Graeme Geldenhuys wrote: I have to say I agree with you The Object Pascal / Delphi language already has way to many string types! At it's just getting worse. I've always liked the Java style of everything being an object - even the string type. The more I look at this Unicode issue, the more I believe we need a fundamental object aproach to it. I mean, before a TString class, we need a TCharacter class in which we need to specify --amongst other things-- what language that character belongs to. This kind of information is needed in order to properly manage the (upper-, lower-, title-, and camel-?) casing issues. On top of this, we also need this information in order to be able to mix and match and display the LTR (left-to-right) and RTL (right-to-left) pieces of strings within the same string. I have done some work on this, but there are at least 2 issues: 1) since each character is a class, memory requirements are increased several fold. 2) Again, the charater-as-class also means that the speed with wich we can create and destroy (and manipulate) a string is a lot slower. I am, at this point, wondering if FPC's object creation/destroy code could be more optimized to be faster to help with this issue. 3) How do you handle the character sets when characters are objects? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Too big documentation images
Michael Van Canneyt wrote: Exactly. There are 4 converters: Latex2html - a huge perl script which needs more time and memory than God needed to create the earth, to convert the docs. We dumped it because it simply takes too long. It's good for small documents (articles), but not for books. tex4ht - Was very good, but has some issues. Problem is that it's no longer maintained, and is no longer included in e.g. SuSE. Hevea - also good, but is written in objective caml (and I thought Pascal was considered obscure) so it also doesn't install easily on a SuSE system. tex2rtf - Very good, fast. But only supports a subset of latex, not enough for the documentation. There's also one other converter, namely, Hyperlatex [ http://hyperlatex.sourceforge.net/ ], written in Lisp. It is mentioned in this page http://small.dropbear.id.au/docs/latexhtml.html and seems to be actively maintained --the README file in the CVS says the latest version, v2.9, is dated November 2006. BTW, I know nothing about Lisp or Latex; but I did some googling to help if I can. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] An incomplete prototype of my ebook FREE PASCAL FROM SQUARE ONE
I have read the text and I'd like to thank the author for the immense effort (both in the past and in the future) put into it. The only criticism I might have on it would be that it is a little too US-centric. I am referring to the stuff related to measurement systems, and other various things that are peculiar to living in US. These are likely to make the job of translator rather hard and/or cause unnecessary confusion in the non-US (and, non-English) reader. Unless of course you let the translators do their job by translating the gist of the matter --i.e. do cultural translation-- by skipping or altering those parts. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] OT? History Based Stack Reconstruction
I wonder if this might be a useful read to the architects here. http://weblogs.mozillazine.org/roc/archives/2007/04/history_based_s.html ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Templates / Generics; Vote
As far as voting goes, personally, I prefer something like this: GList = generic class(T) And, are we going to have non-class rotines, such as event declarations; i.e. TGenericCallback = generic function(AValue1: TGenericValue; AValue2: TGenericValue): Integer; TSomeGenericEvent = generic procedure(ASender: TObject; ACallBack: TGenericCallback) of object; ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] About Linker
I know very little about Linker --just vaguely what it does, and that's all. I googled for it, but could not find much either. Could someone give me some pointers to read about the subject. The reason why I was interested is, it seems a Linker is more difficult to write than a compiler --after all we do have FPC, but not a linker in Pascal. Is it true that a linker is harder to write? If not, why hasn't anyone done one yet? TVMIA ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: [fpc-l] type discussion
Marc Weustink wrote: -- Class Contracts I like the 'require/ensure' aproach. It makes the code more robust and more debuggable, IMHO I think the checks you can do there are to limited. I also wonder what will happen if a require isn't met. Personally I don't want exeption in my released app. No, these are assertions not as exceptions. OK, what to do if an invalid input is met ? Continue ? Skip ? Abort ? IMO you still need some code which takes proper action You have a point here. That, I suppose could be handled through runtime options. But, a construct something like require [...] otherwise [...] end; ensure [...] otherwise [...] end; would be needed. -- Generics I am not sure if Generics could be done in FPC. There were some discussions about it here and AFAIK some are trying to implement. Any links? http://www.freepascal.org/wiki/index.php/Generics Thanks. -- Virtual Properties and Events The examples given there are not very different of what is possible now. Make SetWith virtual and you have almost the same. What however would be nice is if you could override the getter or setter. Something like property Width write MySetWidth I think you missed a few things here. type TMyClass = class ... property Width: integer read write; virtual; abstract; end; As you can see, getters and setters are not in the picture at all. Which means, you have all the freedom you want in the derived class. Which is allmost the same as a virtual abstract Getter and Setter (almost, read/write from a field isn't covered) Plus, I like the idea that I could have a base class with read-only property that can not be overriden to be read-write later. property Width: integer read; virtual; abstract; That makes some sense (but it would be incompatible with existing code) Why would it. Existing code does not have virtual properties. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Re: [fpc-l] type discussion
-- Class Contracts I like the 'require/ensure' aproach. It makes the code more robust and more debuggable, IMHO I think the checks you can do there are to limited. I also wonder what will happen if a require isn't met. Personally I don't want exeption in my released app. No, these are assertions not as exceptions. -- Generics I am not sure if Generics could be done in FPC. There were some discussions about it here and AFAIK some are trying to implement. Any links? -- Virtual Properties and Events The examples given there are not very different of what is possible now. Make SetWith virtual and you have almost the same. What however would be nice is if you could override the getter or setter. Something like property Width write MySetWidth I think you missed a few things here. type TMyClass = class ... property Width: integer read write; virtual; abstract; end; As you can see, getters and setters are not in the picture at all. Which means, you have all the freedom you want in the derived class. Plus, I like the idea that I could have a base class with read-only property that can not be overriden to be read-write later. property Width: integer read; virtual; abstract; OK, while I like the idea, I can not think of how I would use it though :-) Can someone help me out here G -- Enhanced Multicast Events This is not really new. You can implement it yourself like property OnChange: TNotifyList; and then OnChange.Add(Notifyproc) or OnChange.Remove(Notifyproc) OK. Nice to be able to do that. Do I have to write my TNotifyList every time I need it? Inline variable initializers, such as: [snip] var Integer1: Integer = 15; Boolean1: Boolean = False; String1: String = 'SOME TEXT'; Hmm.. sometimes usefull. You can put it as first lines in your constructor/codeblock, but keep it thogheter in say large classes can be handy. Yes, and it improved the readability, IMHO. Plus, there is no reason for you to alter that in constructor/codeblock too. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] modernizing pascal discussions
If you want to modernize the language you can take the current fpc code and extend it yourself. If the extension is clear and we agree on it it can eventually be put in the main fpc release. Discussions are useful. Before one starts coding away, a consensus would be nice to have. I would not want to spend days on something only to be thrown out simply because the idea or the principle got on the wrong sides of others. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel