Re: [fpc-pascal] Effective memory allocation

2014-11-04 Thread Bruno Krayenbuhl
Relooking at your timings and mine, it appears that you allocate 10x my count 
of register-size count of items and require 10x the FillChar which you need to 
initialize your filter array.

My timing is about 80 ms and yours looks like 900 ms for 10x more register 
sized data, which look like the reasonable ratio since we may have difference 
in the way we get the timings  (my timing routines beeing maybe a bit 
optimistic).

Here I think the speed limit is the time it takes to effectively transfer the 
data to RAM and that whether you have a FillChar or FillQWord that ends up 
beeing STOSB or STOSQ, that is fully cached inside the CPU thus the limiting 
factor is the time it takes to move the initialized CPU cached data to the RAM.___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-03 Thread Bruno Krayenbuhl
My results :

_Ptr:=GetMem(1)18 mus, 824 ns / GetMem
_Ptr:=GetMem(1) + FillChar(_Ptr^,1,0)); 81 ms / GetMem + 
FillChar

var
  ArInt:array of int32;
.
SetLength(ArInt, 1 shr 2); 81 ms / SetLength

All timings are variable within [time, time+8%] on repeated runs as is 
generally the case for system timings.

SetLength 0's (Internally calls FillChar) the array ArInt so to do a identical 
comparison you need to add the FillChar to the GetMem if it is your intention 
to 0 the array before using it.

You have to compare things that do the same thing in the end.

Tested on XP Pro Win32 Sp3
Intel Core 2CPU
6400 #@ 2.13GHz
2.05 GHr., 1.00 Gb RAM
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-03 Thread Sven Barth
Am 03.11.2014 08:32 schrieb Nico Erfurth n...@erfurth.eu:

 On 02.11.14 15:05, Xiangrong Fang wrote:
  Sorry, the results in previous mail was mis-labeled.
 
  ​
  The result is:
 
  Using SetLength:
 
  Alloc:  9.40781697E-0001
  Clear:  2.13420202E-0001
 
  Using GetMemory:
 
  Alloc:  2.8100E-0005
  Clear:  7.74975504E-0001


 SetLength will already clear the array before returning it, so the time
 for SetLength is approximate time(GetMem) + time(Clear).

Ah, right. I forgot that :)

Regards,
Sven
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-03 Thread Adriaan van Os

Xiangrong Fang wrote:

Hi All,

I am programming a Bloom Filter and need a high-performance way to 


On what platform are you doing this ?

Regards,

Adriaan van Os

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Effective memory allocation

2014-11-03 Thread Xiangrong Fang
2014-11-03 23:40 GMT+08:00 Adriaan van Os f...@microbizz.nl:

 Xiangrong Fang wrote:

 Hi All,

 I am programming a Bloom Filter and need a high-performance way to


 On what platform are you doing this ?


​I am programming on Linux, but it will be used on both Windows and Linux,
Windows is the primary target.​
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-03 Thread Nico Erfurth
Hi,

 Hi All,
 
 I am programming a Bloom Filter and need a high-performance way to
 
 On what platform are you doing this ?
 
 ​I am programming on Linux, but it will be used on both Windows and
 Linux, Windows is the primary target.​


Well, the first thing you should ask yourself is Do I REALLY need such
a large bloom filter. Everything larger than the last level cache will
seriously harm your performance as you are going to trigger a lot of
cache and TLB misses. In general you should target for typical L1-Cache
sizes (16-32KByte) or if REALLY necessary L2-Cache-sizes
(256KByte-32MByte). Everything above that will make the performance go
down rapidly, especially in hashing-application the CPU can't prefetch
data properly as the access-patterns are erratic.

Nico
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-03 Thread Xiangrong Fang
2014-11-04 6:35 GMT+08:00 Nico Erfurth n...@erfurth.eu:


 Well, the first thing you should ask yourself is Do I REALLY need such
 a large bloom filter. Everything larger than the last level cache will
 seriously harm your performance as you are going to trigger a lot of
 cache and TLB misses. In general you should target for typical L1-Cache
 sizes (16-32KByte) or if REALLY necessary L2-Cache-sizes
 (256KByte-32MByte). Everything above that will make the performance go
 down rapidly, especially in hashing-application the CPU can't prefetch
 data properly as the access-patterns are erratic.


​I didn't think of things like cache and TLB misses. Because I try to use
BloomFilter and HASH to avoid repeated calculation ​and/or lookups which
are time consuming. So, I don't think it will harm performance, unless
bloom filter lookup is slower than the calculation, am I right?

Thanks!
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-02 Thread Xiangrong Fang
Sorry, the results in previous mail was mis-labeled.

​
The result is:

Using SetLength:

Alloc:  9.40781697E-0001
Clear:  2.13420202E-0001

Using GetMemory:

Alloc:  2.8100E-0005
Clear:  7.74975504E-0001
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-02 Thread Sven Barth
Am 02.11.2014 14:55 schrieb Xiangrong Fang xrf...@gmail.com:

 Hi All,

 I am programming a Bloom Filter and need a high-performance way to
allocate and wipe large block of memory.  I did the following test:

 program getmem;
 {$mode objfpc}{$H+}
 uses epiktimer;
 const
   SIZE = 1024 * 1024 * 1024;
   CNT = 10;
 var
   a: array of Byte;
   p: Pointer;
   et: TEpikTimer;
   i: Integer;
   t1, t2: TimerData;
 begin
   et := TEpikTimer.Create(nil);
   et.Clear(t1); et.Clear(t2);
   for i := 1 to CNT do begin
 et.Start(t1);
 p := GetMemory(SIZE);
 //SetLength(a, SIZE);
 et.Stop(t1);
 et.Start(t2);
 FillQWord(p^, SIZE div 8, 0);
 //FillQWord(a[0], SIZE div 8, 0);
 et.Stop(t2);
 FreeMem(p);
 //a := nil;
   end;
   WriteLn('Alloc: ', et.Elapsed(t1) / CNT);
   WriteLn('Clear: ', et.Elapsed(t2) / CNT);
 end.

 The result is:

 Using GetMemory:

 Alloc:  9.40781697E-0001
 Clear:  2.13420202E-0001

 Using SetLength:

 Alloc:  2.8100E-0005
 Clear:  7.74975504E-0001

 It is understandable that GetMemory() is faster than SetLength(), but why
the difference in FillQWord()?

If you use SetLength the dynamic array consists not only of the array data,
but also of an information record in front of it. This will likely lead to
the data not being aligned correctly (FillQWord works best with 8-Byte
alignment). So what about testing FillDWord or FillChar? Just to see
whether they would be faster.


 Also, I usually use pointer to pass block of memory to functions.  How do
I implement a function with the following signature:

 procedure MyProc(var Buf; Len: Integer):

 I mean, how to handle var Buf inside the procedure body?

You need to cast Buf to a pointer if I remember correctly. Take a look at
the implementation of a TStream's descendant's Write method (e.g.
TStringStream) for an example.

Regards,
Sven
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-02 Thread Constantine Yannakopoulos
On Sun, Nov 2, 2014 at 3:54 PM, Xiangrong Fang xrf...@gmail.com wrote:

 Also, I usually use pointer to pass block of memory to functions.  How do
 I implement a function with the following signature:

 procedure MyProc(var Buf; Len: Integer):

 I mean, how to handle var Buf inside the procedure body?


Use @Buf to get a pointer to the address of the untyped variable. If you
want to cast that variable to an internal known type and the compiler
complains you can always use a pointer cast: PMyKnownType(@Buf)^. You can
also use pointer arithmetic (e.g. (PElementType(@Buf) + n)^) if the
variable is known to be an array.

In Delphi (and probably FPC) it is allowed to pass a nil pointer to an
untyped variable if you need to:

begin
  MyProc(nil^, 0);
end;

The compiler understands what you are trying to do and does not complain
about the blatant nil dereference. So var Buf; is functionally equivalent
to Buf: Pointer;. OTOH const Buf; is slightly different in the sense
that the compiler will not let you pass the untyped variable to another
routine that may change its value.

--
Constantine
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-02 Thread Xiangrong Fang
2014-11-03 2:50 GMT+08:00 Sven Barth pascaldra...@googlemail.com:

 If you use SetLength the dynamic array consists not only of the array
 data, but also of an information record in front of it. This will likely
 lead to the data not being aligned correctly (FillQWord works best with
 8-Byte alignment). So what about testing FillDWord or FillChar? Just to see
 whether they would be faster.

 ​It is quite strange that if I use SetLength+FillByte, it is really faster
(for the FillByte), but if I use GetMemory+FillByte, it is not faster than
using FillDWord or FillQWord.

I found this in FPC doc (for $ALIGN switch):

This switch is recognized for Turbo Pascal Compatibility, but is not yet
implemented. The alignment of data will be different in any case, since
Free Pascalis a 32-bit compiler.

​It is a pity that this switch is not recognized yet. But I need to
understand the last statement, will alignment be different or not?

Thanks!

Xiangrong​
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-02 Thread Sven Barth

On 03.11.2014 02:59, Xiangrong Fang wrote:

2014-11-03 2:50 GMT+08:00 Sven Barth pascaldra...@googlemail.com
mailto:pascaldra...@googlemail.com:

If you use SetLength the dynamic array consists not only of the
array data, but also of an information record in front of it. This
will likely lead to the data not being aligned correctly (FillQWord
works best with 8-Byte alignment). So what about testing FillDWord
or FillChar? Just to see whether they would be faster.

​It is quite strange that if I use SetLength+FillByte, it is really
faster (for the FillByte), but if I use GetMemory+FillByte, it is not
faster than using FillDWord or FillQWord.


Then it's indeed as I suspected. The memory buffer itself that is 
allocated by SetLength is allocated, but the pointer returned to you 
(what you get as a dynamic array) has - at least for FillQWord - a bad 
alignment. Looking at the source though it should still be 8-Byte 
aligned though (the information record has a size of 8 on 32-bit 
systems)... *sigh*


Please also note that not on every system the Fill* methods are 
implemented in assembler. FillChar mostly is, FillWord and FillDWord 
perhaps, but FillQWord mostly seldomly (it's e.g. not even implemented 
that way on x86_64). So especially on 32-bit system 2 32-bit moves are 
used each. Thus FillQWord is not always the optimal choice and I'd 
suggest that you time the different Fill* functions especially on 
different platforms.


Would you mind to show the timings that you got for FillChar? :)


I found this in FPC doc (for $ALIGN switch):

This switch is recognized for Turbo Pascal Compatibility, but is not yet
implemented. The alignment of data will be different in any case, since
Free Pascalis a 32-bit compiler.

​It is a pity that this switch is not recognized yet. But I need to
understand the last statement, will alignment be different or not?


I think that last statement is ment regarding that Turbo Pascal is a 
16-bit compiler which had different alignment values available than FPC 
needs. I don't know what TP supported back then, but I don't imagine 
that it supported e.g. 16-Byte alignment.


Additionally this switch won't help you. The memory buffer that is 
allocated by SetLength is also allocated using GetMemory. So the buffer 
itself *is* aligned. But you don't get the start byte of the buffer in 
case of SetLength, but the first byte after the information record which 
might not be aligned correctly and *no* compiler switch will help you 
with that.


Regards,
Sven
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-02 Thread Nikolay Nikolov

On 11/02/2014 03:54 PM, Xiangrong Fang wrote:

Hi All,

...

Also, I usually use pointer to pass block of memory to functions.  How 
do I implement a function with the following signature:


procedure MyProc(var Buf; Len: Integer):

I mean, how to handle var Buf inside the procedure body?

You can take the address of the Buf variable by using @Buf. For example:

procedure MyProc(var Buf; Len: Integer);
var
  I: Integer;
  P: PByte;
begin
  P := @Buf;
  for I := 0 to Len - 1 do
(P+I)^ := 0;
end;

Nikolay
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-02 Thread Nico Erfurth
On 02.11.14 15:05, Xiangrong Fang wrote:
 Sorry, the results in previous mail was mis-labeled.
 
 ​
 The result is:
 
 Using SetLength:
 
 Alloc:  9.40781697E-0001
 Clear:  2.13420202E-0001
 
 Using GetMemory:
 
 Alloc:  2.8100E-0005
 Clear:  7.74975504E-0001


SetLength will already clear the array before returning it, so the time
for SetLength is approximate time(GetMem) + time(Clear).

As your test allocates a gigabyte of memory the OS will not immediatly
assign physical memory to your process. The pagetable after a the
allocation will contain entries which create faults when accessing the
virtual addresses and then the OS will fill those pages in. So on first
access a lot of time will be spend inside the OSes memory subsystem.

For Linux the newly allocated memory will most likely point to a read
only shared physical page which only contains zeros. On the first write
to one of those virtual addresses the real page allocation will trigger
and Linux assigns a new empty physical page to the virtual memory area.
On Linux you can avoid this by passing MAP_POPULATE to mmap when
allocating the memory.

For SetLength this first access is already done when clearing the memory
inside SetLength. For GetMemory it is done when your explicitly access
the memory by calling FillQWord later.

HTH
Nico
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Effective memory allocation

2014-11-02 Thread Xiangrong Fang
2014-11-03 14:39 GMT+08:00 Sven Barth pascaldra...@googlemail.com:


 Would you mind to show the timings that you got for FillChar? :)


​Using FillChar is always about 5% (or less) faster than FillQWord when
used with GetMemory, but will be around 20%-40% faster if the memory is
allocated by SetLength.

Additionally this switch won't help you. The memory buffer that is
 allocated by SetLength is also allocated using GetMemory. So the buffer
 itself *is* aligned. But you don't get the start byte of the buffer in case
 of SetLength, but the first byte after the information record which might
 not be aligned correctly and *no* compiler switch will help you with that.


​Then the problem remains with SetLength vs. GetMemory... Why SetLength is
about 1x (!) slower than GetMemory?​


​allocating 1G memory took about 0.1 second by SetLength​, but is about
1E-5 via GetMemory.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal