The pseries Shared Processor Logical Partition(SPLPAR) machines can
retrieve a log of dispatch and preempt events from the hypervisor
using data from Disptach Trace Log(DTL) buffer. With this information,
user can retrieve when and why each dispatch & preempt has occurred.
The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
via perf.

- Patches 1 to 6 has powerpc PMU driver code changes to capture DTL
  trace in perf.data. And patch 14 has documentation update.
- Patch 7 to 13 is perf tools side code changes to enable perf
  report/script on perf.data file

Infrastructure used
===================

The VPA DTL PMU counters do not interrupt on overflow or generate any
PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
nterval can be provided by user via sample_period field in nano seconds.
vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
Trace Log) contains information about dispatch/preempt, enqueue time etc.
We directly copy the DTL buffer data as part of auxiliary buffer and it
will be processed later. This will avoid time taken to create samples
in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
entries makes use of AUX support in perf infrastructure. On the tools side,
this data is made available as PERF_RECORD_AUXTRACE records.

To corelate each DTL entry with other events across CPU's, an auxtrace_queue
is created for each CPU. Each auxtrace queue has a array/list of auxtrace 
buffers.
All auxtrace queues is maintained in auxtrace heap. The queues are sorted
based on timestamp. When the different PERF_RECORD_XX records are processed,
compare the timestamp of perf record with timestamp of top element in the
auxtrace heap so that DTL events can be co-related with other events
Process the auxtrace queue if the timestamp of element from heap is
lower than timestamp from entry in perf record. Sometimes it could happen that
one buffer is only partially processed. if the timestamp of occurrence of
another event is more than currently processed element in the queue, it will
move on to next perf record. So keep track of position of buffer to continue
processing next time. Update the timestamp of the auxtrace heap with the 
timestamp
of last processed entry from the auxtrace buffer.

This infrastructure ensures dispatch trace log entries can be corelated
and presented along with other events like sched.

vpa-dtl PMU example usage

  # ls /sys/devices/vpa_dtl/
  events  format  perf_event_mux_interval_ms  power  subsystem  type  uevent


To capture the DTL data using perf record:

  # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1

The result can be interpreted using perf report. Snippet of perf report -D:

  # ./perf report -D

There are different PERF_RECORD_XX records. In that records corresponding to
auxtrace buffers includes:

1. PERF_RECORD_AUX
   Conveys that new data is available in AUX area

2. PERF_RECORD_AUXTRACE_INFO
   Describes offset and size of auxtrace data in the buffers

3. PERF_RECORD_AUXTRACE
   This is the record that defines the auxtrace data which here in case of
   vpa-dtl pmu is dispatch trace log data.

Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump

0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690  offset: 0  ref: 0  idx: 0 
 tid: -1  cpu: 0
.
. ... VPA DTL PMU data: size 1680 bytes, entries is 35
.  00000000: boot_tb: 21349649546353231, tb_freq: 512000000
.  00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, 
enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, 
waiting_to_ready_time:6611773
.  00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, 
enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, 
waiting_to_ready_time:15359437
.  00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, 
enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, 
waiting_to_ready_time:5100709
.  000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, 
enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, 
waiting_to_ready_time:30714243
.  000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, 
enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, 
waiting_to_ready_time:15350648
.  00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, 
enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, 
waiting_to_ready_time:15353446
.  00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, 
enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, 
waiting_to_ready_time:15355126
.  00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, 
enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, 
waiting_to_ready_time:5104665

Above is representation of dtl entry of below format:

struct dtl_entry {
        u8      dispatch_reason;
        u8      preempt_reason;
        u16     processor_id;
        u32     enqueue_to_dispatch_time;
        u32     ready_to_enqueue_time;
        u32     waiting_to_ready_time;
        u64     timebase;
        u64     fault_addr;
        u64     srr0;
        u64     srr1;
 };

First two fields represent the dispatch reason and preempt reason. The post
procecssing of PERF_RECORD_AUXTRACE records will translate to meaninful data
for user to consume.

Visualize the dispatch trace log entries with perf report:

  # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.300 MB perf.data ]

  # ./perf report
  # Samples: 321  of event 'vpa-dtl'
  # Event count (approx.): 321
  #
  # Children      Self  Command  Shared Object      Symbol
  # ........  ........  .......  .................  
..............................
  #
     100.00%   100.00%  swapper  [kernel.kallsyms]  [k] 
plpar_hcall_norets_notrace

Visualize the dispatch trace log entries with perf script:

   # ./perf script
            perf   13322 [002]   233.835807:                     
sched:sched_switch: perf:13322 [120] R ==> migration/2:27 [0]
     migration/2      27 [002]   233.835811:               
sched:sched_migrate_task: comm=perf pid=13322 prio=120 orig_cpu=2 dest_cpu=3
     migration/2      27 [002]   233.835818:               
sched:sched_stat_runtime: comm=migration/2 pid=27 runtime=9214 [ns]
     migration/2      27 [002]   233.835819:                     
sched:sched_switch: migration/2:27 [0] S ==> swapper/2:0 [120]
         swapper       0 [002]   233.835822:                                
vpa-dtl: timebase: 338954486062657 dispatch_reason:decrementer_interrupt, 
preempt_reason:H_CEDE, enqueue_to_dispatch_time:435,                      
ready_to_enqueue_time:0, waiting_to_ready_time:34775058, processor_id: 202 
c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
         swapper       0 [001]   233.835886:                                
vpa-dtl: timebase: 338954486095398 dispatch_reason:priv_doorbell, 
preempt_reason:H_CEDE, enqueue_to_dispatch_time:542,                      
ready_to_enqueue_time:0, waiting_to_ready_time:1245360, processor_id: 201 
c0000000000f8094 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])

Aboorva Devarajan (1):
  powerpc/time: Expose boot_tb via accessor

Athira Rajeev (11):
  powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for
    capturing DTL data
  powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
  powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake
    up is needed
  tools/perf: Add basic CONFIG_AUXTRACE support for VPA pmu on powerpc
  tools/perf: process auxtrace events and display in perf report -D
  tools/perf: Add event name as vpa-dtl of PERF_TYPE_SYNTH type to
    present DTL samples
  tools/perf: Allocate and setup aux buffer queue to help co-relate with
    other events across CPU's
  tools/perf: Process the DTL entries in queue and deliver samples
  tools/perf: Add support for printing synth event details via default
    callback
  tools/perf: Enable perf script to present the DTL entries
  powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU

Kajol Jain (2):
  powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
  docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs
    event format entries for vpa_dtl pmu

 .../sysfs-bus-event_source-devices-vpa-dtl    |  25 +
 Documentation/arch/powerpc/index.rst          |   1 +
 Documentation/arch/powerpc/vpa-dtl.rst        | 155 ++++
 arch/powerpc/include/asm/time.h               |   2 +
 arch/powerpc/kernel/time.c                    |   7 +-
 arch/powerpc/perf/Makefile                    |   2 +-
 arch/powerpc/perf/vpa-dtl.c                   | 605 ++++++++++++++
 tools/perf/arch/powerpc/util/Build            |   1 +
 tools/perf/arch/powerpc/util/auxtrace.c       | 122 +++
 tools/perf/builtin-script.c                   |  26 +
 tools/perf/util/Build                         |   1 +
 tools/perf/util/auxtrace.c                    |   4 +
 tools/perf/util/auxtrace.h                    |   1 +
 tools/perf/util/event.h                       |   1 +
 tools/perf/util/powerpc-vpadtl.c              | 756 ++++++++++++++++++
 tools/perf/util/powerpc-vpadtl.h              |  45 ++
 16 files changed, 1752 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/ABI/testing/sysfs-bus-event_source-devices-vpa-dtl
 create mode 100644 Documentation/arch/powerpc/vpa-dtl.rst
 create mode 100644 arch/powerpc/perf/vpa-dtl.c
 create mode 100644 tools/perf/arch/powerpc/util/auxtrace.c
 create mode 100644 tools/perf/util/powerpc-vpadtl.c
 create mode 100644 tools/perf/util/powerpc-vpadtl.h

-- 
2.47.1


Reply via email to