If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
rely on HITM display to report cache false sharing.  Alternatively, we
can use the LLC access and multi-threads info to locate the potential
false sharing's data address, and if we connect with source code and
analyze the multi-threads' execution timing, if can conclude load and
store the same cache line at the meantime, thus this can be helpful for
resolve the cache false sharing issue.

This patch set is to enable the display with sorting on LLC load
accesses; it adds dimensions for total LLC hit and LLC load accesses,
and these dimensions are used for shared cache line table and pareto.

This patch set is dependend on the patch set "perf c2c: Refine the
organization of metrics" [1].

[1] https://lore.kernel.org/patchwork/cover/1321499/

With this patch set, we can get display 'llc' as follows:

  # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio

  [...]

  =================================================
             Shared Data Cache Line Table
  =================================================
  #
  #        ----------- Cacheline ----------  LLC Hit   LLC Hit    Total    
Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  
- RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt      Pct     Total  records    
Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm  
  RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  .......  ........  .......  
.......  .......  .......  .......  .......  .......  .......  ........  
.......  ........  .......  ........  ........
  #
        0      0x563b01e83100     0    1401   65.32%       648     7011     
3738     3273     2582      691      515     2516       59       143      505   
      0        0         0         0
        1      0x563b01e830c0     0       1   26.51%       263      400      
400        0        0        0      130        3        4       262        1    
     0        0         0         0
        2      0x563b01e83080     0       1    7.76%        77      650      
650        0        0        0      180      348       45        14       63    
     0        0         0         0
        3  0xffff88c3d74e82c0     0       1    0.10%         1        1        
1        0        0        0        0        0        0         1        0      
   0        0         0         0
        4  0xffffa587c11e38c0   N/A       0    0.10%         1        2        
1        1        1        0        0        0        0         1        0      
   0        0         0         0
        5  0xffffffffbd5e6fc0     0       1    0.10%         1        1        
1        0        0        0        0        0        0         0        1      
   0        0         0         0
        6      0x7f90a4d6c2c0     0       1    0.10%         1        1        
1        0        0        0        0        0        0         1        0      
   0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto
  =================================================
  #
  #        ---- LLC LD ----  -- Store Refs --  --------- Data address --------- 
                                                  ---------- cycles ----------  
  Total       cpu                                  Shared
  #   Num   LclHit  LclHitm   L1 Hit  L1 Miss              Offset  Node  PA cnt 
     Pid                 Tid        Code address  rmt hitm  lcl hitm      load  
records       cnt               Symbol             Object                  
Source:Line  Node
  # .....  .......  .......  .......  .......  ..................  ....  ...... 
 .......  ..................  ..................  ........  ........  ........  
.......  ........  ...................  .................  
...........................  ....
  #
    -------------------------------------------------------------
        0      143      505     2582      691      0x563b01e83100
    -------------------------------------------------------------
            96.50%    7.72%   46.79%    0.00%                 0x0     0       1 
   14100    14102:lock_th         0x563b01c81c16         0      1949      1331  
   1876         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   0
             0.00%   35.05%    0.00%    0.00%                 0x0     0       1 
   14100    14102:lock_th         0x563b01c81c1d         0      2651       975  
    748         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   0
             0.00%   30.89%    0.00%    0.00%                 0x0     0       1 
   14100    14103:lock_th         0x563b01c81c1d         0      1425      1003  
    762         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   0
             2.10%    7.52%   49.19%    0.00%                 0x0     0       1 
   14100    14103:lock_th         0x563b01c81c16         0      1585      1053  
   2037         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   0
             0.00%    0.00%    2.52%   44.86%                 0x0     0       1 
   14100    14102:lock_th         0x563b01c81c28         0         0         0  
    375         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   0
             0.00%    0.00%    1.51%   55.14%                 0x0     0       1 
   14100    14103:lock_th         0x563b01c81c28         0         0         0  
    420         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   0
             1.40%   12.87%    0.00%    0.00%                0x20     0       1 
   14100    14104:reader_thd      0x563b01c81c73         0       166        99  
    417         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:155   0
             0.00%    5.94%    0.00%    0.00%                0x20     0       1 
   14100    14105:reader_thd      0x563b01c81c73         0       144        85  
    376         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:155   0

  [...]


Leo Yan (8):
  perf mem: Add structure field c2c_stats::tot_llchit
  perf c2c: Add dimensions for total LLC hit
  perf c2c: Add dimensions for LLC load hit
  perf c2c: Change to general naming for macros
  perf c2c: Rename for shared cache line stats
  perf c2c: Refactor hist entry validation
  perf c2c: Add option '-d llc' for sorting with LLC load
  perf c2c: Update documentation for display option 'llc'

 tools/perf/Documentation/perf-c2c.txt |  18 +-
 tools/perf/builtin-c2c.c              | 333 +++++++++++++++++++++-----
 tools/perf/util/mem-events.c          |   3 +
 tools/perf/util/mem-events.h          |   1 +
 4 files changed, 286 insertions(+), 69 deletions(-)

-- 
2.17.1

Reply via email to