From:  <[EMAIL PROTECTED]>
Date: Fri, Oct 17, 2008 at 11:06 PM
Subject: [patch 23/24] perfmon3:  kernel documentation

This patch adds the perfmon interface documentation text file
under Documentation.

Signed-off-by: Stephane Eranian <[EMAIL PROTECTED]>
--

Index: o3/Documentation/perfmon.txt
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ o3/Documentation/perfmon.txt        2008-10-16 12:25:49.000000000 +0200
@@ -0,0 +1,206 @@
+              The perfmon hardware monitoring interface
+              ------------------------------------------
+                          Stephane Eranian
+                         <[EMAIL PROTECTED]>
+
+I/ Introduction
+
+   The perfmon interface provides access to the hardware performance counters
+   of major processors. Nowadays, all processors implement some flavor of
+   performance counters which capture micro-architectural level information
+   such as the number of elapsed cycles, number of cache misses, and so on.
+
+   The interface is implemented as a set of new system calls and a set of
+   config files in /sys.
+
+   It is possible to monitor a single thread or a CPU. In either mode,
+   applications can count or sample. System-wide monitoring is supported by
+   running a monitoring session on each CPU. The interface supports event-based
+   sampling where the sampling period is expressed as the number of occurrences
+   of event, instead of just a timeout. This approach provides a better
+   granularity and flexibility.
+
+   For performance reason, it is possible to use a kernel-level sampling buffer
+   to minimize the overhead incurred by sampling. The format of the buffer,
+   what is recorded, how it is recorded, and how it is exported to user is
+   controlled by a kernel module called a sampling format. The current
+   implementation comes with a default format but it is possible to create
+   additional formats. There is an kernel registration interface for formats.
+   Each format is identified by a simple string which a tool can pass when a
+   monitoring session is created.
+
+   The interface also provides support for event set and multiplexing to work
+   around hardware limitations in the number of available counters or in how
+   events can be combined. Each set defines as many counters as the hardware
+   can support. The kernel then multiplexes the sets. The interface supports
+   time-based switching but also overflow-based switching, i.e., after n
+   overflows of designated counters.
+
+   Applications never manipulates the actual performance counter registers.
+   Instead they see a logical Performance Monitoring Unit (PMU) composed of a
+   set of config registers (PMC) and a set of data registers (PMD). Note that
+   PMD are not necessarily counters, they can be buffers. The logical PMU is
+   then mapped onto the actual PMU using a mapping table which is implemented
+   as a kernel module. The mapping is chosen once for each new processor. It is
+   visible in /sys/kernel/perfmon/pmu_desc. The kernel module is automatically
+   loaded on first use.
+
+   A monitoring session is uniquely identified by a file descriptor obtained
+   when the session is created. File sharing semantics apply to access the
+   session inside a process. A session is never inherited across fork. The file
+   descriptor can be used to receive counter overflow notifications or when the
+   sampling buffer is full. It is possible to use poll/select on the descriptor
+   to wait for notifications from multiple sessions. Similarly, the descriptor
+   supports asynchronous notifications via SIGIO.
+
+   Counters are always exported as being 64-bit wide regardless of what the
+   underlying hardware implements.
+
+II/ Kernel compilation
+
+    To enable perfmon, you need to enable CONFIG_PERFMON and also some of the
+    model-specific PMU modules.
+
+III/ OProfile interactions
+
+    The set of features offered by perfmon is rich enough to support migrating
+    Oprofile on top of it. That means that PMU programming and low-level
+    interrupt handling could be done by perfmon. The Oprofile sampling buffer
+    management code in the kernel as well as how samples are exported to users
+    could remain through the use of a sampling format. This is how Oprofile
+    works on Itanium.
+
+    The current interactions with Oprofile are:
+       - on X86: Both subsystems can be compiled into the same kernel. There
+                 is enforced mutual exclusion between the two subsystems. When
+                 there is an Oprofile session, no perfmon session can exist
+                 and vice-versa.
+
+       - On IA-64: Oprofile works on top of perfmon. Oprofile being a
+                   system-wide monitoring tool, the regular per-thread vs.
+                   system-wide session restrictions apply.
+
+       - on PPC: no integration yet. Only one subsystem can be enabled.
+       - on MIPS: no integration yet.  Only one subsystem can be enabled.
+
+IV/ User tools
+
+    We have released a simple monitoring tool to demonstrate the features of
+    the interface. The tool is called pfmon and it comes with a simple helper
+    library called libpfm. The library comes with a set of examples to show
+    how to use the kernel interface. Visit http://perfmon2.sf.net for details.
+
+    There maybe other tools available for perfmon.
+
+V/ How to program?
+
+   The best way to learn how to program perfmon, is to take a look at the
+   source code for the examples in libpfm. The source code is available from:
+
+               http://perfmon2.sf.net
+
+VI/ System calls overview
+
+   In this section, we describe the state of the interface as submitted to the
+   kernel. There are more extensions available, and we will update the section
+   as they get implemented in the upstream kernel.
+
+   The interface is implemented by the following system calls:
+
+   * int pfm_create(int flags, pfarg_sinfo_t *s);
+
+      This function creates a perfmon per-thread session.
+      The flags parameter is currently unused and must be set to 0.
+
+      Upon return and if s is not NULL, the kernel return the list of available
+      PMC and PMD registers. Tools should not assume, they have access to the
+      entire PMU, it may be shared with other kernel subsystems, e.g., on X86
+      the NMI watchdog timer.
+
+      The function returns the file descriptor identifying the session.
+
+   * int pfm_write(int fd, int flags, int type, void *d, size_t sz)
+
+      This function is used to write PMU registers for the session identified
+      by fd.
+
+      The flags parameter is currently unused and must be set to 0.
+
+      The type reflects the type of registers to write and determines the type
+      of the d parameter. The following types are defined:
+
+         - PFM_RW_PMC: write PMC registers, expect pfarg_pmr_t pointer for d
+         - PFM_RW_PMD: write PMD registers, expect pfarg_pmr_t pointer for d
+
+     The type field is not a bitmask, only one type can be passed per call.
+
+     the sz parameter describes the size of the vector of elements passed in d.
+
+   * int pfm_read(int fd, int flags, int type, void *d, size_t sz);
+
+      This function is used to read PMU registers for the session identified
+      by fd.
+
+      This function is used to write PMU registers for the session identified
+      by fd.
+
+      The flags parameter is currently unused and must be set to 0.
+
+      The type reflects the type of registers to write and determines the type
+      of the d parameter. The following types are supported:
+
+         - PFM_RW_PMD: write PMD registers, expect pfarg_pmr_t pointer for d
+
+     The type field is not a bitmask, only one type can be passed per call.
+
+     Reading of PMC registers is not allowed.
+
+     the sz parameter describes the size of the vector of elements passed in d.
+
+
+   * int pfm_attach(int fd, int flags, int target);
+
+      This function is used to attach and detach the session to and from
+      thread.
+
+      To attach the thread is identified by target which must have the
+      value returned by gettid() (not pthread_self). For a single threaded
+      process, that value is equal to the value returned by getpid().
+
+      To detach, the special target PFM_NO_TARGET must be passed.
+
+      The flags parameter is currently unused and must be set to 0.
+
+      The session is always attached as stopped, i.e., with monitoring
+      inactive. Monitoring is always stopped as a consequence of detaching.
+
+   * int pfm_set_state(int fd, int flags, int state);
+
+     The function is used to set the running state of the session. The state to
+     go to is indicated by state.
+
+     The following states are defined, only one can be specified at a time:
+
+        - PFM_ST_START: start monitoring
+        - PFM_ST_STOP: stop monitoring
+
+      The flags parameter is currently unused and must be set to 0.
+
+   * int close(int fd)
+
+   To destroy a session, the regular close() system call is used.
+
+
+VII/ /sys interface overview
+
+   Refer to Documentation/ABI/testing/sysfs-perfmon-* for a detailed
+   description of the sysfs interface of perfmon2.
+
+VIII/ debugfs interface overview
+
+  Refer to Documentation/perfmon-debugfs.txt for a detailed description of the
+  debug and statistics interface of perfmon.
+
+IX/ Documentation
+
+   Visit http://perfmon2.sf.net
Index: o3/Documentation/ABI/testing/sysfs-perfmon
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ o3/Documentation/ABI/testing/sysfs-perfmon  2008-10-16
12:25:18.000000000 +0200
@@ -0,0 +1,42 @@
+What:          /sys/kernel/perfmon
+Date:          Oct 2008
+KernelVersion: 2.6.27
+Contact:       [EMAIL PROTECTED]
+
+Description:   provide the configuration interface for the perfmon subsystems.
+               The tree contains information about the detected hardware,
+               current state of the subsystem as well as some configuration
+               parameters.
+
+               The tree consists of the following entries:
+
+       /sys/kernel/perfmon/debug (read-write):
+
+               Enable perfmon debugging output. The traces are rate-limited
+               to avoid flooding the console. It is possible to change the
+               throttling via /proc/sys/kernel/printk_ratelimit.
+
+               The value is interpreted as a bitmask.  Each bit enables a
+               particular type of debug messages. Refer to the file
+               include/linux/perfmon_kern.h for more information.
+
+       /sys/kernel/perfmon/task_group (read-write):
+
+               Users group allowed to create a per-thread context (session).
+               -1 means any group.
+
+       /sys/kernel/perfmon/task_sessions_count (read-only):
+
+               Number of per-thread contexts (sessions) currently attached
+               to threads.
+
+       /sys/kernel/perfmon/version (read-only):
+
+               Perfmon interface revision number.
+
+       /sys/kernel/perfmon/arg_mem_max(read-write):
+
+               Maximum size of vector arguments expressed in bytes.
+               It can be modified but must be at least a page.
+               Default: PAGE_SIZE
+
Index: o3/Documentation/ABI/testing/sysfs-perfmon-pmu
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ o3/Documentation/ABI/testing/sysfs-perfmon-pmu      2008-10-16
12:25:04.000000000 +0200
@@ -0,0 +1,48 @@
+What:          /sys/kernel/perfmon/pmu
+Date:          Nov 2007
+KernelVersion: 2.6.24
+Contact:       [EMAIL PROTECTED]
+
+Description:   Provides information about the active PMU description
+               module.  The module contains the mapping of the actual
+               performance counter registers onto the logical PMU exposed by
+               perfmon.  There is at most one PMU description module loaded
+               at any time.
+
+               The sysfs PMU tree provides a description of the mapping for
+               each register. There is one subdir per config and data register
+               along an entry for the name of the PMU model.
+
+               The entries are as follows:
+
+       /sys/kernel/perfmon/pmu_desc/model (read-only):
+
+               Name of the PMU model is clear text and zero terminated.
+
+       Then, for each logical PMU register, XX, gets a subtree with the
+       following entries:
+
+       /sys/kernel/perfmon/pmu_desc/pm*XX/addr (read-only):
+
+               The physical address or index of the actual underlying hardware
+               register.  On Itanium, it corresponds to the index. But on X86
+               processor, this is the actual MSR address.
+
+       /sys/kernel/perfmon/pmu_desc/pm*XX/dfl_val (read-only):
+
+               The default value of the register in hexadecimal.
+
+       /sys/kernel/perfmon/pmu_desc/pm*XX/name (read-only):
+
+               The name of the hardware register.
+
+       /sys/kernel/perfmon/pmu_desc/pm*XX/rsvd_msk (read-only):
+
+               Bitmask of reserved bits, i.e., bits which cannot be changed
+               by applications. When a bit is set, it means the corresponding
+               bit in the actual register is reserved.
+
+       /sys/kernel/perfmon/pmu_desc/pm*XX/width (read-only):
+
+               The width in bits of the registers. This field is only
+               relevant for counter registers.

--

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



-- 
Regards,
Peter Teoh

Reply via email to