subject:"\[fpc\-devel\] Memory consumed by strings"

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner

On Sat, 22 Nov 2008 23:05:43 +0200
listmember [EMAIL PROTECTED] wrote:

 Is there a way to determine how much memory is consumed by strings by
 a running application?
 
 I'd like to know this, in particular, for FPC ana Lazarus --to begin
 with.
 
 And, the reason I'd like to know this is this: Whenever I suggest
 that char size be increased to 4, the idea gets opposed on the grouds
 that it will need huge memory --4 times as much.
 
 There's of course some merit in that arguement, but I have no idea
 what it is '4 times' of.
 
 This is not very engineer-like --it being unmeasured.
 
 Can anyone suggest a way to measure the memory load caused by strings?

The exact amount depends on the application, but think about loading
text files of 100mb into strings. This will need at least the
100mb plus the overhead for each string (at least 12 bytes). With 2 byte
chars an extra of 100mb would be needed and with 4 byte chars 300mb
additional mem would be needed. 

For example the lazarus IDE typically holds 50 to 200mb sources in
memory. If this would be changed to unicodestring (2 byte per char) then
the IDE would need 50 to 200mb more memory. And because many time
consuming tasks are already bound by the memory bandwidth of current
computers, the IDE would become twice as slow. Do the math for 4 byte
per char.


Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


On 2008-11-23 10:19, Mattias Gaertner wrote:

On Sat, 22 Nov 2008 23:05:43 +0200
listmember[EMAIL PROTECTED]  wrote:


Is there a way to determine how much memory is consumed by strings by
a running application?

I'd like to know this, in particular, for FPC ana Lazarus --to begin
with.

And, the reason I'd like to know this is this: Whenever I suggest
that char size be increased to 4, the idea gets opposed on the grouds
that it will need huge memory --4 times as much.

There's of course some merit in that arguement, but I have no idea
what it is '4 times' of.

This is not very engineer-like --it being unmeasured.

Can anyone suggest a way to measure the memory load caused by strings?


The exact amount depends on the application, but think about loading
text files of 100mb into strings. This will need at least the
100mb plus the overhead for each string (at least 12 bytes). With 2 byte
chars an extra of 100mb would be needed and with 4 byte chars 300mb
additional mem would be needed.

For example the lazarus IDE typically holds 50 to 200mb sources in
memory. If this would be changed to unicodestring (2 byte per char) then
the IDE would need 50 to 200mb more memory. And because many time
consuming tasks are already bound by the memory bandwidth of current
computers, the IDE would become twice as slow. Do the math for 4 byte
per char.


What I had in mind wasn't to store the string data in UTF-32 (or UCS-4); 
it would still be UTF-8 or whatever.


I am only considering in memory representation being UTF-32 (or UCS-4).

This way, loading from and saving to would hardly be affected, yet 
in-memory operations would be a lot faster and more simplified.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner

On Sun, 23 Nov 2008 10:31:39 +0200
listmember [EMAIL PROTECTED] wrote:

[...]
 What I had in mind wasn't to store the string data in UTF-32 (or
 UCS-4); it would still be UTF-8 or whatever.
 
 I am only considering in memory representation being UTF-32 (or
 UCS-4).

What do you mean with 'memory representation'?

 
 This way, loading from and saving to would hardly be affected, yet 
 in-memory operations would be a lot faster and more simplified.


Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


I am only considering in memory representation being UTF-32 (or
UCS-4).


What do you mean with 'memory representation'?


That, each char in a string in memory would be 4-bytes (or more); yet, 
when saved on disk (or transmitted across the net etc.) it would be 
UTF-8 compressed. IOW, no compression applied to in-memory strings.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


Actually, load times are not --does not seem to be-- linear at all.

4 times larger file seems to take only twice as long.

I did one very simple test using 2 text files:

File 1: 384 MB (403,248,710 bytes)
File 2: 120 MB (126,680,448 bytes)

with the code below:

procedure TForm1.Button1Click(Sender: TObject);
var
  InitialValue1: Int64; //Initial PerformanceCounter
  Divisor1: Int64; //Performance CounterFrequency
  CurrentValue1: Int64; //Current PerformanceCounter
  Time1: double;
  Time2: double;
  Stream1: TMemoryStream;
  Index1: integer;
begin
  Memo1.Lines.Clear;
  QueryPerformanceFrequency(Divisor1);

  Index1 := 0;
  while Index1  100 do begin
QueryPerformanceFrequency(CurrentValue1);
QueryPerformanceCounter(InitialValue1);
Stream1 := TMemoryStream.Create;
Stream1.LoadFromFile(FILE_1);
Stream1.Free;
QueryPerformanceCounter(CurrentValue1);
Time1 := (CurrentValue1 - InitialValue1) / Divisor1;

QueryPerformanceCounter(InitialValue1);
Stream1 := TMemoryStream.Create;
Stream1.LoadFromFile(FILE_2);
Stream1.Free;
QueryPerformanceCounter(CurrentValue1);
Time2 := (CurrentValue1 - InitialValue1) / Divisor1;

Memo1.Lines.Add(Format('[400 MB: %3.3ns] [100 MB: %3.3ns]',
   [Time1, Time2]));
Inc(Index1);
  end;
end;

Output:

[400 MB: 0.514s] [100 MB: 0.241s]
[400 MB: 0.535s] [100 MB: 0.239s]
[400 MB: 0.532s] [100 MB: 0.252s]
[400 MB: 0.532s] [100 MB: 0.245s]
[400 MB: 0.541s] [100 MB: 0.240s]
[400 MB: 0.533s] [100 MB: 0.240s]
[400 MB: 0.540s] [100 MB: 0.240s]
[400 MB: 0.532s] [100 MB: 0.245s]
[400 MB: 0.532s] [100 MB: 0.234s]
[400 MB: 0.538s] [100 MB: 0.240s]
[400 MB: 0.531s] [100 MB: 0.241s]
[400 MB: 0.533s] [100 MB: 0.242s]
[400 MB: 0.531s] [100 MB: 0.242s]
[400 MB: 0.585s] [100 MB: 0.252s]
[400 MB: 0.531s] [100 MB: 0.243s]
[400 MB: 0.531s] [100 MB: 0.289s]
[400 MB: 0.569s] [100 MB: 0.240s]
[400 MB: 0.532s] [100 MB: 0.235s]
[400 MB: 0.535s] [100 MB: 0.241s]
[400 MB: 0.533s] [100 MB: 0.242s]
[400 MB: 0.532s] [100 MB: 0.239s]
[400 MB: 0.531s] [100 MB: 0.241s]
[400 MB: 0.532s] [100 MB: 0.239s]
[400 MB: 0.532s] [100 MB: 0.245s]
[400 MB: 0.536s] [100 MB: 0.239s]
[400 MB: 0.534s] [100 MB: 0.256s]
[400 MB: 0.547s] [100 MB: 0.242s]
[400 MB: 0.535s] [100 MB: 0.261s]
[400 MB: 0.530s] [100 MB: 0.232s]
[400 MB: 0.541s] [100 MB: 0.239s]
[400 MB: 0.533s] [100 MB: 0.243s]
[400 MB: 0.535s] [100 MB: 0.244s]
[400 MB: 0.530s] [100 MB: 0.231s]
[400 MB: 0.540s] [100 MB: 0.240s]
[400 MB: 0.582s] [100 MB: 0.330s]
[400 MB: 0.557s] [100 MB: 0.231s]
[400 MB: 0.539s] [100 MB: 0.240s]
[400 MB: 0.531s] [100 MB: 0.230s]
[400 MB: 0.539s] [100 MB: 0.243s]
[400 MB: 0.531s] [100 MB: 0.246s]
[400 MB: 0.535s] [100 MB: 0.240s]
[400 MB: 0.532s] [100 MB: 0.279s]
[400 MB: 0.609s] [100 MB: 0.241s]
[400 MB: 0.533s] [100 MB: 0.249s]
[400 MB: 0.537s] [100 MB: 0.239s]
[400 MB: 0.531s] [100 MB: 0.242s]
[400 MB: 0.530s] [100 MB: 0.240s]
[400 MB: 0.535s] [100 MB: 0.238s]
[400 MB: 0.532s] [100 MB: 0.241s]
[400 MB: 0.536s] [100 MB: 0.242s]
[400 MB: 0.532s] [100 MB: 0.240s]
[400 MB: 0.534s] [100 MB: 0.230s]
[400 MB: 0.545s] [100 MB: 0.235s]
[400 MB: 0.538s] [100 MB: 0.240s]
[400 MB: 0.531s] [100 MB: 0.235s]
[400 MB: 0.536s] [100 MB: 0.229s]
[400 MB: 0.540s] [100 MB: 0.232s]
[400 MB: 0.540s] [100 MB: 0.243s]
[400 MB: 0.539s] [100 MB: 0.234s]
[400 MB: 0.540s] [100 MB: 0.230s]
[400 MB: 0.539s] [100 MB: 0.261s]
[400 MB: 0.535s] [100 MB: 0.242s]
[400 MB: 0.529s] [100 MB: 0.234s]
[400 MB: 0.538s] [100 MB: 0.234s]
[400 MB: 0.538s] [100 MB: 0.244s]
[400 MB: 0.535s] [100 MB: 0.242s]
[400 MB: 0.529s] [100 MB: 0.239s]
[400 MB: 0.532s] [100 MB: 0.251s]
[400 MB: 0.631s] [100 MB: 0.236s]
[400 MB: 0.535s] [100 MB: 0.242s]
[400 MB: 0.531s] [100 MB: 0.243s]
[400 MB: 0.531s] [100 MB: 0.239s]
[400 MB: 0.531s] [100 MB: 0.232s]
[400 MB: 0.543s] [100 MB: 0.239s]
[400 MB: 0.528s] [100 MB: 0.232s]
[400 MB: 0.538s] [100 MB: 0.242s]
[400 MB: 0.537s] [100 MB: 0.233s]
[400 MB: 0.537s] [100 MB: 0.241s]
[400 MB: 0.533s] [100 MB: 0.230s]
[400 MB: 0.543s] [100 MB: 0.242s]
[400 MB: 0.533s] [100 MB: 0.240s]
[400 MB: 0.531s] [100 MB: 0.253s]
[400 MB: 0.537s] [100 MB: 0.243s]
[400 MB: 0.547s] [100 MB: 0.238s]
[400 MB: 0.539s] [100 MB: 0.233s]
[400 MB: 0.545s] [100 MB: 0.257s]
[400 MB: 0.572s] [100 MB: 0.318s]
[400 MB: 0.563s] [100 MB: 0.238s]
[400 MB: 0.536s] [100 MB: 0.241s]
[400 MB: 0.533s] [100 MB: 0.249s]
[400 MB: 0.531s] [100 MB: 0.242s]
[400 MB: 0.534s] [100 MB: 0.241s]
[400 MB: 0.532s] [100 MB: 0.238s]
[400 MB: 0.537s] [100 MB: 0.241s]
[400 MB: 0.616s] [100 MB: 0.253s]
[400 MB: 0.536s] [100 MB: 0.228s]
[400 MB: 0.540s] [100 MB: 0.244s]
[400 MB: 0.539s] [100 MB: 0.237s]
[400 MB: 0.536s] [100 MB: 0.241s]
[400 MB: 0.539s] [100 MB: 0.236s]
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Sergei Gorelkin


Graeme Geldenhuys wrote:

On Sun, Nov 23, 2008 at 10:19 AM, Mattias Gaertner
[EMAIL PROTECTED] wrote:

On Sat, 22 Nov 2008 23:05:43 +0200
For example the lazarus IDE typically holds 50 to 200mb sources in
memory. If this would be changed to unicodestring (2 byte per char) then
the IDE would need 50 to 200mb more memory.


Ah, and that would probably explain why Martin decided not to
pre-parse units in MSEide - for things like code complection etc...
MSEide's memory usage would balloon greatly, compared to Lazarus.

One can always choose the string type which is most appropriate for the 
given task. For storing Pascal (or whatever) sources, one choice is not 
to use plaintext at all, but replace each identifier with its index in a 
dictionary. It depends on the task.


Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner

On Sun, 23 Nov 2008 11:09:25 +0200
listmember [EMAIL PROTECTED] wrote:

  I am only considering in memory representation being UTF-32 (or
  UCS-4).
 
  What do you mean with 'memory representation'?
 
 That, each char in a string in memory would be 4-bytes (or more);
 yet, when saved on disk (or transmitted across the net etc.) it would
 be UTF-8 compressed. IOW, no compression applied to in-memory strings.

I thought my example described just that. If strings use 4 bytes per
char then ASCII text will need 4 times more memory.


Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


On 2008-11-23 13:07, Graeme Geldenhuys wrote:

On Sun, Nov 23, 2008 at 12:29 PM, listmember[EMAIL PROTECTED]  wrote:

What I am curious about is: 4 times of what?


RAM, Ramdom Access Memory, DIMMs those little green sticks you
shove into the motherboard.  :-)


:)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Graeme Geldenhuys

On Sun, Nov 23, 2008 at 1:05 PM, listmember [EMAIL PROTECTED] wrote:
 I just checked (using Process Explorer, under Windows) and this is what I
 see:

 Working set: 2,216 K
 Peak Working set: 26,988 K

 I can't see where that 50 MB fits into that.

Well it all depends on how many files you have open, project size etc...


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Sergei Gorelkin


listmember wrote:


This is my thick-day. So, permit me to ask this:

Are you really saying that strings occupy 50 MB Lazarus's memory footprint?

I just checked (using Process Explorer, under Windows) and this is what 
I see:


Working set: 2,216 K
Peak Working set: 26,988 K

I can't see where that 50 MB fits into that.


There's no easy way to tell how much storage the strings occupy. There 
are functions like GetHeapStatus and GetFPCHeapStatus, but they return 
the total amount of memory occupied by everything that the application 
allocates - objects, dyn.arrays, strings etc.
However, you may hack into RTL at the NewAnsiString / NewWideString / 
NewUnicodeString procedures and install hooks that will record the 
number of bytes requested. That shouldn't be too difficult to do.


Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Graeme Geldenhuys

On Sun, Nov 23, 2008 at 12:29 PM, listmember [EMAIL PROTECTED] wrote:

 What I am curious about is: 4 times of what?

RAM, Ramdom Access Memory, DIMMs those little green sticks you
shove into the motherboard.  :-)


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Graeme Geldenhuys

On Sun, Nov 23, 2008 at 1:13 PM, Graeme Geldenhuys
[EMAIL PROTECTED] wrote:
 I can't see where that 50 MB fits into that.

 Well it all depends on how many files you have open, project size etc...


As an example. Using a small project, Lazarus sits at 26MB or memory.
I then open the MacOSAll.pas (10.2MB text file) unit from FPC. Lazarus
memory usage jumped it 80MB.  So as you can see, it varies depending
on what you have open etc..

Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner

On Sun, 23 Nov 2008 13:05:15 +0200
listmember [EMAIL PROTECTED] wrote:

 On 2008-11-23 12:50, Jonas Maebe wrote:
 
  On 23 Nov 2008, at 11:29, listmember wrote:
 
  It is not hard to tell that an app that works with text files
  (such as Lazarus) will consume 4 times more memory per file
  loaded.
 
  But, how much memory does, say, Lazarus --itself-- consume
  specifically for string storage when run for the first time?
 
   From Matias' original answer: For example the lazarus IDE
  typically holds 50 to 200mb sources in memory.
 
  I.e., at least 4 times 50 to 200mb.
 
 This is my thick-day. So, permit me to ask this:
 
 Are you really saying that strings occupy 50 MB Lazarus's memory
 footprint?
 
 I just checked (using Process Explorer, under Windows) and this is
 what I see:
 
 Working set: 2,216 K
 Peak Working set: 26,988 K
 
 I can't see where that 50 MB fits into that.

Do a 'find declaration' on an identifier, that does not exist. This
will explore all units of the uses section.


Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


However, you may hack into RTL at the NewAnsiString / NewWideString /
NewUnicodeString procedures and install hooks that will record the
number of bytes requested. That shouldn't be too difficult to do.


This is what I was looking for.
Thank you.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


Do a 'find declaration' on an identifier, that does not exist. This
will explore all units of the uses section.


Now I see what you mean.

But, isn't this a design-choice; caching all sources in memory for speed 
reasons, as opposed to on-demand opening and closing each file.


Still. If that is how it works, it is how it works.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Daniël Mantione




Op Sun, 23 Nov 2008, schreef listmember:

What I had in mind wasn't to store the string data in UTF-32 (or UCS-4); it 
would still be UTF-8 or whatever.


I am only considering in memory representation being UTF-32 (or UCS-4).

This way, loading from and saving to would hardly be affected, yet in-memory 
operations would be a lot faster and more simplified.


For source code, en extended ASCII charset like UTF-8 is the best choice, 
since all characters that need processing are in the ASCII range, the code 
needs to do nothing about the high ASCII codes except keeping them in one 
part.


Therefore, any other encoding is a waste of memory and does not gain you 
any speed. For that reason, I don't see the compiler switch from 8-bit 
processing either.


The situation is very different when processing real text, the memory 
saving advantages dissappear for the majority of the world, and if you 
want to process characters beyond #127, UTF-16 and UTF-32 are much 
easier. Obviously, UTF-32 is the best encoding if there are characters you 
need to process are beyond #65535.


Only if you need to process characters (rather than pass them on), UTF-32 
is a lot faster and simpler.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember




On 2008-11-23 13:49, Jonas Maebe wrote:


On 23 Nov 2008, at 12:35, listmember wrote:


But, isn't this a design-choice; caching all sources in memory for
speed reasons, as opposed to on-demand opening and closing each file.


For very large projects, that should probably be done anyway at some
point. But even in that case, using a more memory-efficient string type
enables you to keep more data in memory and hence potentially obtain
better performance.


The last time I joined a relevant discussion, I was told worrying about 
native UCS-4 string-type would be pointless simply because that sort of 
thing is really needed for word processors only.


Now, I have been informed that Lazarus (and perhaps other IDEs) use 
upwards of 50 MB string space just to do one of their basic operations.


That leaves me wondering how much do we lose performance-wise in 
endlessly decompressing UTF-8 data, instead of using, say, UCS-4 strings.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


On 2008-11-23 14:10, Daniël Mantione wrote:


Therefore, any other encoding is a waste of memory and does not gain you
any speed. For that reason, I don't see the compiler switch from 8-bit
processing either.


I nearly fully agree with you.

Except that, when a string constant needs to contain non-ASCI chars. 
What do we do in these cases?



Only if you need to process characters (rather than pass them on),
UTF-32 is a lot faster and simpler.


Yes. If I knew how to write this patch, I'd be working on it right now.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Daniël Mantione




Op Sun, 23 Nov 2008, schreef listmember:


On 2008-11-23 14:10, Daniël Mantione wrote:


Therefore, any other encoding is a waste of memory and does not gain you
any speed. For that reason, I don't see the compiler switch from 8-bit
processing either.


I nearly fully agree with you.

Except that, when a string constant needs to contain non-ASCI chars. What do 
we do in these cases?


The common approach is to do nothing, no processing needs to be done. I.e. 
the compiler justs passes on the bytes one by one from the source file to 
the object file.


For an IDE, this is a little bit more complicated. I.e. searching for a ç 
in a source file needs to find both the composed and the decomposed 
variant, and in the case of UTF-8, this character can be encoded in 1, 2, 
3 or 4 bytes which all need to be found. This is where UTF-16 and UTF-32 
start to make sense.



Only if you need to process characters (rather than pass them on),
UTF-32 is a lot faster and simpler.


Yes. If I knew how to write this patch, I'd be working on it right now.


Unfortunately an UTF-32 string type is not on our roadmap either, so it 
would have to be an user contribution.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Jonas Maebe



On 23 Nov 2008, at 13:31, Daniël Mantione wrote:

For an IDE, this is a little bit more complicated. I.e. searching  
for a ç in a source file needs to find both the composed and the  
decomposed variant, and in the case of UTF-8, this character can be  
encoded in 1, 2, 3 or 4 bytes which all need to be found. This is  
where UTF-16 and UTF-32 start to make sense.


Characters can also be decomposed in UTF-16 and in UTF-32 (for the  
same reasons as in UTF-8).



Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner

On Sun, 23 Nov 2008 12:37:32 +0100
Martin Schreiber [EMAIL PROTECTED] wrote:

 On Sunday 23 November 2008 09.26:35 Graeme Geldenhuys wrote:
  On Sun, Nov 23, 2008 at 10:19 AM, Mattias Gaertner
 
  [EMAIL PROTECTED] wrote:
   On Sat, 22 Nov 2008 23:05:43 +0200
   For example the lazarus IDE typically holds 50 to 200mb sources in
   memory. If this would be changed to unicodestring (2 byte per
   char) then the IDE would need 50 to 200mb more memory.
 
  Ah, and that would probably explain why Martin decided not to
  pre-parse units in MSEide - for things like code complection etc...
  MSEide's memory usage would balloon greatly, compared to Lazarus.
 
 MSEide parses the code for code navigation only and on demand. For
 creating event handlers and the like the compiled in RTTI will be
 used. I decided not to parse the RTL because I wanted to be
 independent from the source installation and because I think the task
 to do exact parsing of the whole FPC RTL and other libraries is too
 difficult and not necessary because RTTI provides sufficient
 information. The parser uses 8bit strings, 16bit is used in the code
 editor. It is possible to work a whole day with MSEide without
 closing a single file and without noticeable loss of speed.

MSEGui is fast and makes sophisticated use of the RTTI.
I think too, that the internal format of the source editor (visual) does
not matter much.

But RTTI only contains published classes, does it not?

Does MSEGui read ppu files?


Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Daniël Mantione




Op Sun, 23 Nov 2008, schreef Jonas Maebe:



On 23 Nov 2008, at 13:31, Daniël Mantione wrote:

For an IDE, this is a little bit more complicated. I.e. searching for a ç 
in a source file needs to find both the composed and the decomposed 
variant, and in the case of UTF-8, this character can be encoded in 1, 2, 3 
or 4 bytes which all need to be found. This is where UTF-16 and UTF-32 
start to make sense.


Characters can also be decomposed in UTF-16 and in UTF-32 (for the same 
reasons as in UTF-8).


I am aware of that, but the combining cedille is not in the easy to 
process range of UTF-8. In other words, you cannot do

if char[i]=combining_cedille in UTF-8.

Instead UTF-8, you need to make sure the string has enough characters 
left, and then compare multiple characters. Heck, you even need to take 
care of the fact the the combining cedille can be encoded in 2, 3 or 4 
bytes.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner

On Sun, 23 Nov 2008 14:11:50 +0200
listmember [EMAIL PROTECTED] wrote:

[...]
  For very large projects, that should probably be done anyway at some
  point. But even in that case, using a more memory-efficient string
  type enables you to keep more data in memory and hence potentially
  obtain better performance.
 
 The last time I joined a relevant discussion, I was told worrying
 about native UCS-4 string-type would be pointless simply because that
 sort of thing is really needed for word processors only.
 
 Now, I have been informed that Lazarus (and perhaps other IDEs) use 
 upwards of 50 MB string space just to do one of their basic
 operations.
 
 That leaves me wondering how much do we lose performance-wise in 
 endlessly decompressing UTF-8 data, instead of using, say, UCS-4
 strings.

I'm wondering what you mean with 'endlessly decompressing UTF-8
data'.
You have to make a compromise between memory, ease of use and
compatibility. There is no solution without drawbacks.

If you want to process large 8bit text files then UTF-8 is better.
If you want to paint glyphs then normalized UTF-32 is better.
If you want some unicode with some mem overhead and some easy usage and
have compiler support for some compatibility then UTF-16 is better.

Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Martin Schreiber

On Sunday 23 November 2008 09.26:35 Graeme Geldenhuys wrote:
 On Sun, Nov 23, 2008 at 10:19 AM, Mattias Gaertner

 [EMAIL PROTECTED] wrote:
  On Sat, 22 Nov 2008 23:05:43 +0200
  For example the lazarus IDE typically holds 50 to 200mb sources in
  memory. If this would be changed to unicodestring (2 byte per char) then
  the IDE would need 50 to 200mb more memory.

 Ah, and that would probably explain why Martin decided not to
 pre-parse units in MSEide - for things like code complection etc...
 MSEide's memory usage would balloon greatly, compared to Lazarus.

MSEide parses the code for code navigation only and on demand. For creating 
event handlers and the like the compiled in RTTI will be used. I decided not 
to parse the RTL because I wanted to be independent from the source 
installation and because I think the task to do exact parsing of the whole 
FPC RTL and other libraries is too difficult and not necessary because RTTI 
provides sufficient information.
The parser uses 8bit strings, 16bit is used in the code editor. It is possible 
to work a whole day with MSEide without closing a single file and without 
noticeable loss of speed.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


I thought my example described just that. If strings use 4 bytes per
char then ASCII text will need 4 times more memory.


I am not disputing that.

What I am curious about is: 4 times of what?

It is not hard to tell that an app that works with text files (such as 
Lazarus) will consume 4 times more memory per file loaded.


But, how much memory does, say, Lazarus --itself-- consume specifically 
for string storage when run for the first time?


This is what I am after.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


On 2008-11-23 12:50, Jonas Maebe wrote:


On 23 Nov 2008, at 11:29, listmember wrote:


It is not hard to tell that an app that works with text files (such
as Lazarus) will consume 4 times more memory per file loaded.


But, how much memory does, say, Lazarus --itself-- consume
specifically for string storage when run for the first time?


 From Matias' original answer: For example the lazarus IDE typically
holds 50 to 200mb sources in memory.

I.e., at least 4 times 50 to 200mb.


This is my thick-day. So, permit me to ask this:

Are you really saying that strings occupy 50 MB Lazarus's memory footprint?

I just checked (using Process Explorer, under Windows) and this is what 
I see:


Working set: 2,216 K
Peak Working set: 26,988 K

I can't see where that 50 MB fits into that.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner

On Sun, 23 Nov 2008 13:35:07 +0200
listmember [EMAIL PROTECTED] wrote:

  Do a 'find declaration' on an identifier, that does not exist. This
  will explore all units of the uses section.
 
 Now I see what you mean.
 
 But, isn't this a design-choice; caching all sources in memory for
 speed reasons, as opposed to on-demand opening and closing each file.

The codetools do almost everything on demand.
Needed sources are parsed, put together (include files) and cleaned
from dead code (IFDEFs), trees are built and find declaration results
are cached. This costs a lot of time (more than the compiler doing
the same). But if something changed the codetools know what to
rebuild while OTOH FPC has to rebuilt everything. The naked source
itself normally takes less than 25%.
For example: You can change the declaration of 'integer'. FPC would now
need to recompile almost every unit. But you will hardly notice
much work of the codetools.
These dependencies are complex and require exclusive access. The
memory belongs to the program, the source files can be changed by
anyone. 
Therefore the files are kept in memory and auto reloaded if they
change on disk.

 
 Still. If that is how it works, it is how it works.

Many applications use strings for text files.
As soon as they don't fit into the CPU cache you get a performance
decrease.



Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner

On Sun, 23 Nov 2008 13:49:32 +0100 (CET)
Daniël Mantione [EMAIL PROTECTED] wrote:

 
 
 Op Sun, 23 Nov 2008, schreef Jonas Maebe:
 
 
  On 23 Nov 2008, at 13:31, Daniël Mantione wrote:
 
  For an IDE, this is a little bit more complicated. I.e. searching
  for a ç in a source file needs to find both the composed and the
  decomposed variant, and in the case of UTF-8, this character can
  be encoded in 1, 2, 3 or 4 bytes which all need to be found. This
  is where UTF-16 and UTF-32 start to make sense.
 
  Characters can also be decomposed in UTF-16 and in UTF-32 (for the
  same reasons as in UTF-8).
 
 I am aware of that, but the combining cedille is not in the easy to 
 process range of UTF-8. In other words, you cannot do
 if char[i]=combining_cedille in UTF-8.
 
 Instead UTF-8, you need to make sure the string has enough characters 
 left, and then compare multiple characters. Heck, you even need to
 take care of the fact the the combining cedille can be encoded in 2,
 3 or 4 bytes.

Which means that there are three different unicode codes for this
character, which means a single if-equal does not work in UTF-16 or
UTF32 too.

if UTF8CharacterToUnicode(@s[i],CharLen) in
[cedille1,cedille2,cedille3] then


Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Sergei Gorelkin


Daniël Mantione wrote:


Instead UTF-8, you need to make sure the string has enough characters 
left, and then compare multiple characters. Heck, you even need to take 
care of the fact the the combining cedille can be encoded in 2, 3 or 4 
bytes.


In this example it may be more efficient to encode three variants of 
cedilla into utf8 and do three searches with Pos(), instead of decoding 
the whole target string. It depends, of course - at least at how long 
the target string is.


Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Jonas Maebe



On 23 Nov 2008, at 12:35, listmember wrote:


Do a 'find declaration' on an identifier, that does not exist. This
will explore all units of the uses section.


Now I see what you mean.

But, isn't this a design-choice; caching all sources in memory for  
speed reasons, as opposed to on-demand opening and closing each file.


For very large projects, that should probably be done anyway at some  
point. But even in that case, using a more memory-efficient string  
type enables you to keep more data in memory and hence potentially  
obtain better performance.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Marco van de Voort

In our previous episode, listmember said:
 Is there a way to determine how much memory is consumed by strings by a 
 running application?

Maybe you can keep a counter in the routines of astrings.  Increase/adjust on
newansistring or setlength.
 
 I'd like to know this, in particular, for FPC ana Lazarus --to begin with.
 
 And, the reason I'd like to know this is this: Whenever I suggest that 
 char size be increased to 4, the idea gets opposed on the grouds that it 
 will need huge memory --4 times as much.

That's not the only reason:
- more memory also means slower copy.
- Most OSes seem to use uTF-8 and UTF-16, with -32 you would an island, and
the avg text editors might not be able to read what you write
 
 There's of course some merit in that arguement, but I have no idea what 
 it is '4 times' of.

 This is not very engineer-like --it being unmeasured.

It is highly dependant on use. An attempt on a single application says
nothing.

The app that I work on for a living has maybe 0.5MB of strings, and hardly
any time consuming processing. (mostly a simple logfile).

In previous jobs however I have done database-in-memory, database pumps and
importers, and there it matters.
 
 Can anyone suggest a way to measure the memory load caused by strings?

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Marco van de Voort

In our previous episode, listmember said:
 The last time I joined a relevant discussion, I was told worrying about 
 native UCS-4 string-type would be pointless simply because that sort of 
 thing is really needed for word processors only.
 
 Now, I have been informed that Lazarus (and perhaps other IDEs) use 
 upwards of 50 MB string space just to do one of their basic operations.
 
 That leaves me wondering how much do we lose performance-wise in 
 endlessly decompressing UTF-8 data, instead of using, say, UCS-4 strings.

If you leave about character composition you don't need to for e.g. an often
used primitives like compare an identifier
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


On 2008-11-23 14:34, Mattias Gaertner wrote:

On Sun, 23 Nov 2008 14:11:50 +0200
listmember[EMAIL PROTECTED]  wrote:



That leaves me wondering how much do we lose performance-wise in
endlessly decompressing UTF-8 data, instead of using, say, UCS-4
strings.


I'm wondering what you mean with 'endlessly decompressing UTF-8
data'.


I am referring to going to the nth character in a string. With UTF-8 it 
is no more a simple arithmetic and an index operation. You have to start 
from zero and iterate until you get to your characters --at every step, 
calculating whether it is 2, 3 or 4 bytes long. Doing this is decompression.



You have to make a compromise between memory, ease of use and
compatibility. There is no solution without drawbacks.

If you want to process large 8bit text files then UTF-8 is better.
If you want to paint glyphs then normalized UTF-32 is better.
If you want some unicode with some mem overhead and some easy usage and
have compiler support for some compatibility then UTF-16 is better.


Do we have to think in terms of encodings (which are, ways of 
compressing text) when what we actually mean 1-byte, 2-byte and 4-byte 
per char strings.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


On 2008-11-23 14:19, Mattias Gaertner wrote:

On Sun, 23 Nov 2008 13:35:07 +0200
listmember[EMAIL PROTECTED]  wrote:


[...]

These dependencies are complex and require exclusive access. The
memory belongs to the program, the source files can be changed by
anyone.
Therefore the files are kept in memory and auto reloaded if they
change on disk.


Makes sense. Thank you for explaining it.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


On 2008-11-23 14:49, Daniël Mantione wrote:


Op Sun, 23 Nov 2008, schreef Jonas Maebe:


On 23 Nov 2008, at 13:31, Daniël Mantione wrote:


For an IDE, this is a little bit more complicated. I.e. searching for
a ç in a source file needs to find both the composed and the
decomposed variant, and in the case of UTF-8, this character can be
encoded in 1, 2, 3 or 4 bytes which all need to be found. This is
where UTF-16 and UTF-32 start to make sense.


Characters can also be decomposed in UTF-16 and in UTF-32 (for the
same reasons as in UTF-8).


I am aware of that, but the combining cedille is not in the easy to
process range of UTF-8. In other words, you cannot do
if char[i]=combining_cedille in UTF-8.

Instead UTF-8, you need to make sure the string has enough characters
left, and then compare multiple characters. Heck, you even need to take
care of the fact the the combining cedille can be encoded in 2, 3 or 4
bytes.


This is one of the million and one small details that one has to keep in 
mind while programming.


What I think would more sensible is that, instead of using all these 
variable sizes and all, simply use 4-byte/char strings and compose (in 
UTF sense) everything into that string.


You do this once, when importing/loading text to your app. And, then on, 
everthing is just like the good old string --except that it is a 4-byte 
per char string, instead of 1-byte.


Now, my question is this: How would I create a 'FourByteString' type, 
reference counted etc. just like the usual 'String'?


How hard is it?

Can someone like me, who does nor speak assembler, do it?

If so, where do I begin copypasting from 'string'?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Martin Schreiber

On Sunday 23 November 2008 13.44:02 Mattias Gaertner wrote:

 But RTTI only contains published classes, does it not?

AFAIK there are some more elements where is is possible to get a typeinfo 
pointer. A compiler specialist can say more. :-)

 Does MSEGui read ppu files?

No.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Marco van de Voort

In our previous episode, Martin Schreiber said:
[ Charset ISO-8859-1 unsupported, converting... ]
 On Sunday 23 November 2008 13.44:02 Mattias Gaertner wrote:
 
  But RTTI only contains published classes, does it not?
 
 AFAIK there are some more elements where is is possible to get a typeinfo 
 pointer. A compiler specialist can say more. :-)

Well, I'm not an expert, but I can only think of enumerations. These have
RTTI under Delphi because they are shown in the Object Inspector.

And afaik that's it? 
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


On 2008-11-23 15:10, Marco van de Voort wrote:

In our previous episode, listmember said:


[]..


I'd like to know this, in particular, for FPC ana Lazarus --to begin with.

And, the reason I'd like to know this is this: Whenever I suggest that
char size be increased to 4, the idea gets opposed on the grouds that it
will need huge memory --4 times as much.


That's not the only reason:
- more memory also means slower copy.


True. But, being multiples of 4-bytes, may compenmsate for it. Don't 
quote me on this though.



- Most OSes seem to use uTF-8 and UTF-16, with -32 you would an island, and
the avg text editors might not be able to read what you write


The answer to that is this:

1) When inputting/outputting text to/from file or the OS, you use UTF-8 
(or whatever is native/required).


2) You do not make UTF-32 mandatory. But, it should be there for those 
(and those cases) that need it.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Daniël Mantione




Op Sun, 23 Nov 2008, schreef Marco van de Voort:


In our previous episode, Martin Schreiber said:
[ Charset ISO-8859-1 unsupported, converting... ]

On Sunday 23 November 2008 13.44:02 Mattias Gaertner wrote:


But RTTI only contains published classes, does it not?


AFAIK there are some more elements where is is possible to get a typeinfo
pointer. A compiler specialist can say more. :-)


Well, I'm not an expert, but I can only think of enumerations. These have
RTTI under Delphi because they are shown in the Object Inspector.

And afaik that's it?


The compiler uses RTTI to copy data structures with dynamic data type 
inside. I.e. records have RTTI because there might be a widestring inside, 
the RTL to do e.g. an assignment correctly.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Marco van de Voort

In our previous episode, Dani?l Mantione said:
  AFAIK there are some more elements where is is possible to get a typeinfo
  pointer. A compiler specialist can say more. :-)
 
  Well, I'm not an expert, but I can only think of enumerations. These have
  RTTI under Delphi because they are shown in the Object Inspector.
 
  And afaik that's it?
 
 The compiler uses RTTI to copy data structures with dynamic data type 
 inside. I.e. records have RTTI because there might be a widestring inside, 
 the RTL to do e.g. an assignment correctly.

Ah, didn't know the intiializer/finalizer tables can be accessed/walked using
typeinfo too.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re[2]: [fpc-devel] Memory consumed by strings

2008-11-23 Thread JoshyFun

Hello Daniël,

Sunday, November 23, 2008, 1:49:32 PM, you wrote:


DM I am aware of that, but the combining cedille is not in the easy to
DM process range of UTF-8. In other words, you cannot do
DM if char[i]=combining_cedille in UTF-8.

DM Instead UTF-8, you need to make sure the string has enough characters
DM left, and then compare multiple characters. Heck, you even need to take
DM care of the fact the the combining cedille can be encoded in 2, 3 or 4
DM bytes.

Combined and uncombined strings are different things for different
tasks, the only common point is that both have the same visual
representation, but unicode function CharAt (or alike) over
uncombined string must never report the combined character as a
result. Some functions are designed to work over uncombined strings
and other over combined ones, because some things can not be done over
one of the formats.

-- 
Best regards,
 JoshyFun

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re[2]: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Daniël Mantione




Op Sun, 23 Nov 2008, schreef JoshyFun:


Combined and uncombined strings are different things for different
tasks, the only common point is that both have the same visual
representation, but unicode function CharAt (or alike) over
uncombined string must never report the combined character as a
result.


I was not claiming that :) Instead I was saying that Edit-Search in an 
IDE is expected to find both.



Some functions are designed to work over uncombined strings
and other over combined ones, because some things can not be done over
one of the formats.


True.

Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re[3]: [fpc-devel] Memory consumed by strings

2008-11-23 Thread JoshyFun

Hello Daniël,

Sunday, November 23, 2008, 5:21:16 PM, you wrote:

 Combined and uncombined strings are different things for different
 tasks, the only common point is that both have the same visual
 representation, but unicode function CharAt (or alike) over
 uncombined string must never report the combined character as a
 result.
DM I was not claiming that :) Instead I was saying that Edit-Search in an
DM IDE is expected to find both.

Yes, I know, but an Edit - Search is, in example in Lazarus, working
over a composed data as it has not sense to use decomposed data in a
source editor (unless I'm wrong of course).


-- 
Best regards,
 JoshyFun

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Graeme Geldenhuys

On Sun, Nov 23, 2008 at 3:45 PM, listmember [EMAIL PROTECTED] wrote:

 I am referring to going to the nth character in a string. With UTF-8 it is
 no more a simple arithmetic and an index operation. You have to start from
 zero and iterate until you get to your characters --at every step,
 calculating whether it is 2, 3 or 4 bytes long. Doing this is decompression.

Well if the string is well formed UTF-8, the first byte of each
character will tell you how far to jump ahead, so you don't need to
visit each byte.

With UTF-16, you also can't just jump to the n'th character. It also
needs special attention to check for surrogate pairs.

At least the good thing of UTF-8 is that you don't have to worry about
LE or BE byte orders. UTF-16 and UTF-32 have that nasty issue.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember


On 2008-11-23 19:31, Graeme Geldenhuys wrote:

At least the good thing of UTF-8 is that you don't have to worry about
LE or BE byte orders. UTF-16 and UTF-32 have that nasty issue.


LE/BE only applies when streaming to/from file/device/network, otherwise 
life is much simpler with UTF-32.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

[fpc-devel] Memory consumed by strings

2008-11-22 Thread listmember

Is there a way to determine how much memory is consumed by strings by a 
running application?


I'd like to know this, in particular, for FPC ana Lazarus --to begin with.

And, the reason I'd like to know this is this: Whenever I suggest that 
char size be increased to 4, the idea gets opposed on the grouds that it 
will need huge memory --4 times as much.


There's of course some merit in that arguement, but I have no idea what 
it is '4 times' of.


This is not very engineer-like --it being unmeasured.

Can anyone suggest a way to measure the memory load caused by strings?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

46 matches

Mail list logo