Dear all,
I'd let you know that I'm observing that the PEBS sampling through
PEBS stores seem to behave badly (at least to my understanding) in
cooperation with -c flag.
I'm running Linux 3.11.0 on a Intel SandyBridge machine with the
following info
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping : 7
and for testing purposes I'm using the attached program (which simply
transfers data from one vector to another one) to depict the problem.
When I use perf stat to get information of the loads & stores of this
app I get this output (which I reduced manually)
$ perf stat -e r81d0 ./a.out # Intel manual [1] in table 19-17
indicates that event number d0 + umask 81 refers to all loads
671.488.050 loads
$ perf stat -e r82d0 ./a.out # The same as before, but for stores
356.521.360 stores
We can see there that the number of stores is half the number of
loads. However, when I use the perf mem record command for every 10k
loads I get the following info:
$ perf mem -t load record -c 10000 ./a.out
[perf record: Woken up 1 times to write data]
[perf record: Captured and wrote 0.047 MB perf.data (~2036 samples)]
but when looking for samples every 10k stores I get
$ perf mem -t store record -c 10000 ./a.out
...
[perf record: Woken up 4 times to write data]
[perf record: Captured and wrote 0.921 MB perf.data (~40247 samples)]
Notice that the number of samples raised by 20x, which to me seems
very odd because the number of stores was half, so I expected 0.5x here.
Or am I supposing this the wrong way?
Just for further testing, if I omit the -c parameter (which I need
:S), it seems to work better
$ perf mem -t load record ./a.out
[perf record: Woken up 1 times to write data]
[perf record: Captured and wrote 0.172 MB perf.data (~7508 samples)]
$ perf mem -t store record ./a.out
[perf record: Woken up 1 times to write data]
[perf record: Captured and wrote 0.151 MB perf.data (~6607 samples)]
Best regards.
[1]
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf
WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.
http://www.bsc.es/disclaimer
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *long_str = "This is a very long string!";
char dest[1024*1024*1024];
int main (int argc, char *argv[])
{
int i;
int length = strlen (long_str);
for (i = 0; i < 1024*1024*1024-length; i += length)
memcpy (&dest[i], long_str, length);
printf ("CHECK: %c\n", dest[0*length+0]);
printf ("CHECK: %c\n", dest[1*length+1]);
printf ("CHECK: %c\n", dest[2*length+2]);
printf ("CHECK: %c\n", dest[3*length+3]);
return 0;
}