Hi Andreas,

thank you for your help!

I found OFFCORE_RESPONSE_0:OTHER:NON_DRAM to correlate quite well with
what I see from uncore CBoxes when counting PCIe traffic. At least
optically the plots show a similar behavior, I did not yet try to
convert the raw counter values to meaningful metrics like Bytes/sec
and see if they still somehow correlate.

OFFCORE_RESPONSE has a lot of Request and Response types to choose
from, so that it is difficult for me to tell which are best suited for
counting PCIe/MMIO transactions. OTHER and NON_DRAM are my closest
guess right now.

Figure 18-26 in the SDM shows a MSR_UNCORE_ADDR_OPCODE_MATCH register
where I can use the uncore PMUs to filter transactions for specific
physical addresses. If it could filter address ranges, I could filter
for the pysical address space of a specific PCIe device, which would
take me where I want to go. But it does look like I only can filter
for specific addresses.

I need to have a closer look at it tomorrow, maybe there still is a way :)

Cheers,
Andre

2014/1/15 Andreas Hollmann <hollm...@in.tum.de>:
> Hi Andre,
>
> you could take a look at Offcore counters. This counter are
> per CPU and offer possibilities of filter certain events.
>
> I cannot tell you if they are really suited for you needs,
> but you could give them a try.
>
> The best documentation on Offcore counters I this:
> http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf
>
> and the Intel SDM Volume 3:
> https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf
>
> I would also advice you to use libpfm4 to translate event names like
>
> OFFCORE_RESPONSE_0:ANY_REQUEST:REMOTE_DRAM
>
> into raw events for perf. Here is some overview about howto use
> libpfm4 with perf
>
> http://www.bnikolic.co.uk/blog/hpc-prof-events.html
>
> It doesn't cover the use of offcore counters. But here is one example:
>
> check_events OFFCORE_RESPONSE_0:ANY_DATA:REMOTE_DRAM
> Supported PMU models:
>         [7, netburst, "Pentium4"]
>         [8, netburst_p, "Pentium4 (Prescott)"]
>         [11, core, "Intel Core"]
>         [14, atom, "Intel Atom"]
>         [15, nhm, "Intel Nehalem"]
>         [16, nhm_ex, "Intel Nehalem EX"]
>         [17, nhm_unc, "Intel Nehalem uncore"]
>         [18, ix86arch, "Intel X86 architectural PMU"]
>         [51, perf, "perf_events generic PMU"]
>         [52, wsm, "Intel Westmere (single-socket)"]
>         [53, wsm_dp, "Intel Westmere DP"]
>         [54, wsm_unc, "Intel Westmere uncore"]
>         [55, amd64_k7, "AMD64 K7"]
>         [56, amd64_k8_revb, "AMD64 K8 RevB"]
>         [57, amd64_k8_revc, "AMD64 K8 RevC"]
>         [58, amd64_k8_revd, "AMD64 K8 RevD"]
>         [59, amd64_k8_reve, "AMD64 K8 RevE"]
>         [60, amd64_k8_revf, "AMD64 K8 RevF"]
>         [61, amd64_k8_revg, "AMD64 K8 RevG"]
>         [62, amd64_fam10h_barcelona, "AMD64 Fam10h Barcelona"]
>         [63, amd64_fam10h_shanghai, "AMD64 Fam10h Shanghai"]
>         [64, amd64_fam10h_istanbul, "AMD64 Fam10h Istanbul"]
>         [68, snb, "Intel Sandy Bridge"]
>         [69, amd64_fam14h_bobcat, "AMD64 Fam14h Bobcat"]
>         [70, amd64_fam15h_interlagos, "AMD64 Fam15h Interlagos"]
>         [71, snb_ep, "Intel Sandy Bridge EP"]
>         [72, amd64_fam12h_llano, "AMD64 Fam12h Llano"]
>         [73, amd64_fam11h_turion, "AMD64 Fam11h Turion"]
>         [74, ivb, "Intel Ivy Bridge"]
>         [76, snb_unc_cbo0, "Intel Sandy Bridge C-box0 uncore"]
>         [77, snb_unc_cbo1, "Intel Sandy Bridge C-box1 uncore"]
>         [78, snb_unc_cbo2, "Intel Sandy Bridge C-box2 uncore"]
>         [79, snb_unc_cbo3, "Intel Sandy Bridge C-box3 uncore"]
>         [80, snbep_unc_cbo0, "Intel Sandy Bridge-EP C-Box 0 uncore"]
>         [81, snbep_unc_cbo1, "Intel Sandy Bridge-EP C-Box 1 uncore"]
>         [82, snbep_unc_cbo2, "Intel Sandy Bridge-EP C-Box 2 uncore"]
>         [83, snbep_unc_cbo3, "Intel Sandy Bridge-EP C-Box 3 uncore"]
>         [84, snbep_unc_cbo4, "Intel Sandy Bridge-EP C-Box 4 uncore"]
>         [85, snbep_unc_cbo5, "Intel Sandy Bridge-EP C-Box 5 uncore"]
>         [86, snbep_unc_cbo6, "Intel Sandy Bridge-EP C-Box 6 uncore"]
>         [87, snbep_unc_cbo7, "Intel Sandy Bridge-EP C-Box 7 uncore"]
>         [88, snbep_unc_ha, "Intel Sandy Bridge-EP HA uncore"]
>         [89, snbep_unc_imc0, "Intel Sandy Bridge-EP IMC0 uncore"]
>         [90, snbep_unc_imc1, "Intel Sandy Bridge-EP IMC1 uncore"]
>         [91, snbep_unc_imc2, "Intel Sandy Bridge-EP IMC2 uncore"]
>         [92, snbep_unc_imc3, "Intel Sandy Bridge-EP IMC3 uncore"]
>         [93, snbep_unc_pcu, "Intel Sandy Bridge-EP PCU uncore"]
>         [94, snbep_unc_qpi0, "Intel Sandy Bridge-EP QPI0 uncore"]
>         [95, snbep_unc_qpi1, "Intel Sandy Bridge-EP QPI1 uncore"]
>         [96, snbep_unc_ubo, "Intel Sandy Bridge-EP U-Box uncore"]
>         [97, snbep_unc_r2pcie, "Intel Sandy Bridge-EP R2PCIe uncore"]
>         [98, snbep_unc_r3qpi0, "Intel Sandy Bridge-EP R3QPI0 uncore"]
>         [99, snbep_unc_r3qpi1, "Intel Sandy Bridge-EP R3QPI1 uncore"]
>         [100, knc, "Intel Knights Corner"]
>         [103, ivb_ep, "Intel Ivy Bridge EP"]
>         [104, hsw, "Intel Haswell"]
>         [105, ivb_unc_cbo0, "Intel Ivy Bridge C-box0 uncore"]
>         [106, ivb_unc_cbo1, "Intel Ivy Bridge C-box1 uncore"]
>         [107, ivb_unc_cbo2, "Intel Ivy Bridge C-box2 uncore"]
>         [108, ivb_unc_cbo3, "Intel Ivy Bridge C-box3 uncore"]
> Detected PMU models:
>         [18, ix86arch, "Intel X86 architectural PMU"]
>         [51, perf, "perf_events generic PMU"]
>         [53, wsm_dp, "Intel Westmere DP"]
> Total events: 3042 available, 177 supported
> Requested Event: OFFCORE_RESPONSE_0:ANY_DATA:REMOTE_DRAM
> Actual    Event:
> wsm_dp::OFFCORE_RESPONSE_0:DMND_DATA_RD:DMND_RFO:PF_DATA_RD:PF_RFO:REMOTE_DRAM:k=1:u=1:e=0:i=0:c=0:t=0
> PMU            : Intel Westmere DP
> IDX            : 111149145
> Codes          : 0x5301b7 0x2033 <----|
>
>                                                                |---
> Codes are the configs for using them in perf
>
> Now you can use the these offcore counters with perf stat:
>
> First number config, second config1
>
> [hollmann@inwest format]$ perf stat -e
> cpu/config=0x5301b7,config1=0x2033,name=Remote_DRAM_Accesses/ ls
> any  cmask  edge  event  inv  ldlat  offcore_rsp  pc  umask
>
>  Performance counter stats for 'ls':
>
>                 46 Remote_DRAM_Accesses
>
>        0.001096052 seconds time elapsed
>
>
>
> [hollmann@inwest format]$ showevtinfo offcore
>
> returns the line
>
> Umask-20 : 0x8000 : PMU : [NON_DRAM] : None : Response: Non-DRAM
> requests that were serviced by IOH
>
> which could be useful in your case.
>
> Best regards,
> Andreas
>
> 2014/1/15 Andre Richter <andre.o.rich...@gmail.com>:
>> Hello everyone,
>>
>> I am currently fiddling around with performance monitoring on a Xeon
>> Machine with an e5-2600 series CPU.
>>
>> What I understand from Intel's uncore performance monitoring guide and
>> from Andi Kleen's Readmes from his super useful pmu-tools, it is not
>> possible to differentiate PCIe traffic per core when using the uncore
>> PMUs (the CBo boxes to be precise). It is only possible per socket.
>>
>> I wonder, however, if it would be possible to at least make an
>> educated guess of which core is producing how much PCIe or MMIO
>> traffic.
>> Maybe by cross-correlating with some PMU evens from core-resident PMUs?
>> The amount of PMU events and configuration options is quite
>> overwhelming, that's why I would appreciate any hints I can get :)
>>
>> Cheers,
>> Andre
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" 
>> in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to