Thanks.

I still have question on perf, and because I can't find an answer on the net I am asking here again. I wrote a simple C program calling CPUID to get information about my laptop performance monitoring units (putting 0XA in eax register before calling cpuid)

In the results located in ebx I see a 1 bit for the core cycle event and a 1 bit for the instruction retired event. According to Intel documentation these ones means that theassociated events are not available. But I can use these events with perf, and user libraries such as PAPI are telling me that they are available.

I attached to this mail my program, is there any problem in it, where is my mistake ?

Manu

On 07/02/2013 04:49 PM, Andi Kleen wrote:
On Tue, Jul 02, 2013 at 03:33:18PM +0200, Manuel Selva wrote:
Thanks again for the help.Your answer suggests that events listed as
Hardware event by perf listare what is called Architecural Events
for Intel processors, isn't it ?
perf uses a superset of the architectural events
(but only a small subset of a full Intel event list)

Also it supports setting the other events in raw form
(or various add-on tools exist to provide them as names)

On my Sandy Bridge core i5-2520M, perf list reports 10 hardware
events, where as they are only 7 entriesin the table 18-1 of Intel
documentation you mentioned. So I am wondering what are these 3
additional events;
Not all events supported by perf are in sysfs.

event=0x00,umask=0x03 (ref-cycles)
event=0xb1,umask=0x01,inv,cmask=0x01 (stalled-cycles-backend)
event=0x0e,umask=0x01,inv,cmask=0x01 (stalled-cycles-frontend)

Looking at table 19-7 in the same Intel document, I can see non
architectural events for my core i5-2xxx. In this table I can see
that:

ref-cycles                       ==> Can't find it
This is typically called CPU_CLK_UNHALTED.REF_TSC or so
in the Intel documentation.

stalled-cycles-backend ==> Counts total number of uops to be
dispatched per- thread each cycle. Set Cmask = 1, INV =1 to count
stall cycles.
stalled-cycles-frontend ==> Increments each cycle the # of Uops
issued by the RAT to RS.Set Cmask = 1, Inv = 1, Any= 1to count
stalled cycles of this core.
These two are very broken. Just ignore them.

-Andi

#include <stdio.h>

int main() {

    unsigned int resultEax;
    unsigned int resultEbx;
    unsigned int resultEdx;

    __asm__("movl $0xa, %%eax" : ); // Moves 0xA in EAX: CPUID input param to get performance monitoring info
    __asm__("cpuid" : );
    __asm__("movl %%eax, %0" :"=r"(resultEax) : :);
    __asm__("movl %%ebx, %0" :"=r"(resultEbx) : :);
    __asm__("movl %%edx, %0" :"=r"(resultEdx) : :);

    printf("%-82s =  %2u\n", "Version ID of architectural performance monitoring" , resultEax & 255U); // Bits 07 to 00
    printf("%-82s =  %2u\n", "Number of general-purpose performance monitoring counter per logical processor", (resultEax >> 8) & 255); // Bits 15 to 08
    printf("%-82s =  %2u\n", "Bit width of general-purpose performance monitoring counter", (resultEax >> 16) & 255); // Bits 23 to 16
    printf("%-82s =  %2u\n", "Length of EBX bit vector to enumerate architectural performance monitoring events", (resultEax >> 24) & 255); // Bits 31 to 24
    printf("\n");

    printf("%-82s =  %2u\n", "Core cycle event (0 if available, 1 if not)", resultEbx & 1); // Bits 00
    printf("%-82s =  %2u\n", "Instruction retired event (0 if available, 1 if not)", (resultEbx >> 1) & 1); // Bits 01
    printf("%-82s =  %2u\n", "Reference cycles event (0 if available, 1 if not)", (resultEbx >> 2) & 1); // Bits 02
    printf("%-82s =  %2u\n", "Last level cache reference event (0 if available, 1 if not)", (resultEbx >> 3) & 1); // Bits 03
    printf("%-82s =  %2u\n", "Last level cache misses event (0 if available, 1 if not)", (resultEbx >> 4) & 1); // Bits 04
    printf("%-82s =  %2u\n", "Branch instruction retired event (0 if available, 1 if not)", (resultEbx >> 5) & 1); // Bits 05
    printf("%-82s =  %2u\n", "Branch mispredict retired event (0 if available, 1 if not)", (resultEbx >> 6) & 1); // Bits 06
    printf("\n");

    printf("%-82s =  %2u\n", "Number of fixed-function performance counters", (resultEdx >> 0) & 15); // Bits 04 to 00
    printf("%-82s =  %2u\n", "Bit width of fixed-function performance counters", (resultEdx >> 5) & 255); // Bits 12 to 05

    return 0;
}

Reply via email to