Linux 4.3 introduced two new record types for recording context
switches: PERF_RECORD_SWITCH and PERF_RECORD_SWITCH_CPU_WIDE.

The advantage over the existing tracepoint and software context
switch events is primarily that full switch in/out data can be
gathered even in the face of restrictive perf_event_paranoid
settings.

Signed-off-by: Vince Weaver <vincent.wea...@maine.edu>

diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
index 68b99bb..04a0cf5 100644
--- a/man2/perf_event_open.2
+++ b/man2/perf_event_open.2
@@ -243,8 +243,9 @@ struct perf_event_attr {
           comm_exec      :  1,  /* flag comm events that are
                                    due to exec */
           use_clockid    :  1,  /* use clockid for time fields */
+          context_switch :  1,  /* context switch data */
 
-          __reserved_1   : 38;
+          __reserved_1   : 37;
 
     union {
         __u32 wakeup_events;    /* wakeup every n events */
@@ -1112,6 +1113,21 @@ field.
 This can make it easier to correlate perf sample times with
 timestamps generated by other tools.
 .TP
+.IR "context_switch" " (since Linux 4.3)"
+.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+This enables the generation of
+.B PERF_RECORD_SWITCH
+records when a context switch occurs.
+It also enables the generation of
+.B PERF_RECORD_SWITCH_CPU_WIDE
+records when sampling in cpu-wide mode.
+This functionality is in addition to existing tracepoint and
+software events for measuring context switches.
+The advantage of this method is that it will give full
+information event with strict
+.I perf_event_paranoid
+settings.
+.TP
 .IR "wakeup_events" ", " "wakeup_watermark"
 This union sets how many samples
 .RI ( wakeup_events )
@@ -1792,7 +1808,8 @@ Sample happened in guest user code.
 .RE
 
 .RS
-In addition, one of the following bits can be set:
+The following three statuses are generated by
+different record types so they alias to the same bit:
 .TP
 .BR PERF_RECORD_MISC_MMAP_DATA " (since Linux 3.10)"
 .\" commit 2fe85427e3bf65d791700d065132772fc26e4d75
@@ -1807,9 +1824,18 @@ record on kernels more recent than Linux 3.16
 if a process name change was caused by an
 .BR exec (2)
 system call.
-It is an alias for
-.B PERF_RECORD_MISC_MMAP_DATA
-since the two values would not be set in the same record.
+.TP
+.BR PERF_RECORD_MISC_SWITCH_OUT " (since Linux 4.3)"
+.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+When a
+.BR PERF_RECORD_SWITCH " or " PERF_RECORD_SWITCH_CPU_WIDE
+record is generated this bit indicates that the
+context switch is away from the current process
+(instead of in to the current process).
+.RE
+
+.RS
+In addition, the following bits can be set:
 .TP
 .B PERF_RECORD_MISC_EXACT_IP
 This indicates that the content of
@@ -2583,6 +2609,59 @@ struct {
 .I lost
 the number of potentially lost samples.
 .RE
+.TP
+.BR PERF_RECORD_SWITCH " (since Linux 4.3)"
+\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+This record indicates a context switch has happened.
+The
+.B PERF_RECORD_MISC_SWITCH_OUT
+bit in the
+.I misc
+field indicates whether it was a context switch into
+or away from the current process.
+
+.in +4n
+.nf
+struct {
+    struct perf_event_header header;
+    struct sample_id sample_id;
+};
+.fi
+.TP
+.BR PERF_RECORD_SWITCH_CPU_WIDE " (since Linux 4.3)"
+\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+As with
+.B PERF_RECORD_SWITCH
+this record indicates a context switch has happened,
+but it only occurs when sampling in cpu-wide mode
+and provides additional information on the process
+being switched to/from.
+The
+.B PERF_RECORD_MISC_SWITCH_OUT
+bit in the
+.I misc
+field indicates whether it was a context switch into
+or away from the current process.
+
+.in +4n
+.nf
+struct {
+    struct perf_event_header header;
+    u32 next_prev_pid;
+    u32 next_prev_tid;
+    struct sample_id sample_id;
+};
+.fi
+.RS
+.TP
+.I next_prev_pid
+The process id of the previous (if switching in)
+or next (if switching out) process on the CPU.
+.TP
+.I next_prev_tid
+The thread id of the previous (if switching in)
+or next (if switching out) thread on the CPU.
+.RE
 .RE
 .SS Overflow handling
 Events can be set to notify when a threshold is crossed,

Reply via email to