Hi Joerg,
Sorry for the delay regarding the response. I can describe
the invocation and the results, pertaining to static counts. Also, I would
imagine that driver writers or individuals wanting to measure IOMMU translation
performance would be the consumers regarding this perf capability. Of course,
this is my understanding and why I am very interested in the kernel communities
comments and advice. First, to invoke the use of the IOMMUv2 PMU the following
command will suffice:
./perf stat -e iommuv2/config=0x8000000000000005,config1=0x0/u
<command> /* I have the RAW bit explicitly set (MSb) */
The <config> will set the following:
CSource [7:0] - Identifies the IOMMUv2 performance metric that
will be counted. In this case 0x05 which is the total peripheral memory
operations translated.
DeviceID [23:8] - The PCI BDF identifying the specific device
that will be considered. In this case 0x0000 is the IOMMU itself.
PASID [39:24] - Filter based on PASID, optional. 0x0000, no
filtering
Domain [55:40] - Filter based on Domain, optional, 0x0000 no
filtering.
en_deviceid_filter[56] - Explicit enabling of DeviceID
filtering, implicitly set if DeviceID is not 0x0000.
en_pasid_filter[57] - Must be set to enable optional PASID
filtering.
en_domain_filter [58] - Must be set to enable optional Domain
filtering.
The <config1> will set the following (more obscure settings)
deviceid_mask [15:0] - Apply a bit mask, regarding the
associated filter, or match register, for refining purposes.
pasid_mask [31:16] - Same as device_mask pertaining to PASID.
domain_mask [47:32] - Same as device_mask, pertaining to Domain.
When the IOMMUv2 PMU is invoked, the first task is to verify there is a PC
resource available. The IOMMUv2 PMU uses a soft register and bit mask,
linearized from bank/counter information populated within the amd_iommu struct
during initialization, to allocate a free bank/counter to assign to the perf
IOMMU event. The bank/counter information is used, among other values, to
calculate an offset into the IOMMU MMIO region to access registers; for example
ICounter, CSource, etc. So from an IOMMUv2 driver perspective, pertaining to
the additional functionality written into amd_iommu_init.c, once the IOMMUv2
PMU has assigned the counter resource it needs to configure the physical
IOMMUv2 PC registers. For example,:
1) Allocate IOMMUv2 Bank/Counter index, first go-around the
assignment is bank=0, counter=0.
2) At the moment, the code is only populating the DevID (PCI
BDF) into DeviceID; PASID and Domain will be added later. The devid is held
to 0x0000.
3) The Fxn is the functional register within the counter set
and is used to calculate the counter register offset within the MMIO Region.
For example CSource is +08h; see Table 70: Counter Bank Addressing (MMIO) in
IOMMUv2 2.0 specification.
4) The value to be written, in the case of the above example,
is 0x05, pertaining to the CSource register.
5) Since this is a write operation is_write is true.
6) Now there is enough information to access the IOMMUv2 PC
register(s) and the perf IOMMUv2 calls into the IOMMU core driver (exported
function)
Int amd_iommu_v2_get_set_pc_reg_val( u16 devid, u8
bank, u8 cntr, u8 fxn, long long *value, bool is_write);
Most of the IOMMUv2 driver functionality is self-explanatory,
and the function, above, will verify IOMMUv2 PC capability, calculate the
counter set offset within the IOMMU MMIO region and verify that the offset is
within the MMIO region aperture. After this is completed, the function simply
writes to the selected register. Since the number of banks and counters are
dynamic, dependent upon future design, the limits for MMIO region offset values
are calculated based on reported maximum bank/counter.
After the CSource register has been written to, other than a zero(0), the
ICounter will start counting the relative IOMMU events described by the CSource
value.
To stop the counter (ICounter), the CSource register is set to zero(0); so a
perf event accessing the IOMMUv2 PC will write a defined value to the CSource
register, execute a command, write a zero(0) to the CSource register then read
the ICounter value. The count, for the specific IOMMU perf event, is the
previous count minus the current ICounter value; the ICounter cannot be reset
other than overflow.
So, when the perf command example is executed, for example with a ls or some
other trivial executable, the result will be a count of all IOMMU peripheral
memory operations translated (total). I choose this simply to assure count
increment.
Sorry for the long winded explanation, but we can look at any detail you would
like to explore regarding the above description.
BR,
Steve
-----Original Message-----
From: Joerg Roedel [mailto:[email protected]]
Sent: Monday, January 28, 2013 9:37 AM
To: Kinney, Steven
Cc: Thomas Gleixner; Ingo Molnar; H. Peter Anvin; [email protected]; Bjorn
Helgaas; Greg Kroah-Hartman; Sebastian Andrzej Siewior; Myron Stowe; Hiroshi
DOYU; Stephen Warren; Jiri Kosina; Kukjin Kim; [email protected];
[email protected]; Peter Zijlstra; Paul Mackerras; Arnaldo
Carvalho de Melo; Thomas Renninger; Andi Kleen; Cyrill Gorcunov
Subject: Re: [PATCH 1/3] AMD x86 quirks: Quirk for enabling IOMMUv2 PC feature
On Mon, Jan 28, 2013 at 02:59:25PM +0000, Kinney, Steven wrote:
> Testing with perf shows expected results.
Can you give me an impression on how the results look like when perf is used?
Since the hardware is widely available yet I can't try this myself.
Joerg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/