On Wed, Jul 17, 2024 at 11:12 PM Athira Rajeev <atraj...@linux.vnet.ibm.com> wrote: > > > > > On 18 Jul 2024, at 11:04 AM, Namhyung Kim <namhy...@kernel.org> wrote: > > > > Hello, > > > > On Sat, Jul 13, 2024 at 10:25:11PM +0530, Athira Rajeev wrote: > >> The patchset from Namhyung added support for data type profiling > >> in perf tool. This enabled support to associate PMU samples to data > >> types they refer using DWARF debug information. With the upstream > >> perf, currently it possible to run perf report or perf annotate to > >> view the data type information on x86. > >> > >> Initial patchset posted here had changes need to enable data type > >> profiling support for powerpc. > >> > >> https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9...@csgroup.eu/T/ > >> > >> Main change were: > >> 1. powerpc instruction nmemonic table to associate load/store > >> instructions with move_ops which is use to identify if instruction > >> is a memory access one. > >> 2. To get register number and access offset from the given > >> instruction, code uses fields from "struct arch" -> objump. > >> Added entry for powerpc here. > >> 3. A get_arch_regnum to return register number from the > >> register name string. > >> > >> But the apporach used in the initial patchset used parsing of > >> disassembled code which the current perf tool implementation does. > >> > >> Example: lwz r10,0(r9) > >> > >> This line "lwz r10,0(r9)" is parsed to extract instruction name, > >> registers names and offset. Also to find whether there is a memory > >> reference in the operands, "memory_ref_char" field of objdump is used. > >> For x86, "(" is used as memory_ref_char to tackle instructions of the > >> form "mov (%rax), %rcx". > >> > >> In case of powerpc, not all instructions using "(" are the only memory > >> instructions. Example, above instruction can also be of extended form (X > >> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category > >> and extract the source/target registers, second patchset added support to > >> use > >> raw instruction. With raw instruction, macros are added to extract opcode > >> and register fields. > >> Link to second patchset: > >> https://lore.kernel.org/all/20240506121906.76639-1-atraj...@linux.vnet.ibm.com/ > >> > >> Example representation using --show-raw-insn in objdump gives result: > >> > >> 38 01 81 e8 ld r4,312(r1) > >> > >> Here "38 01 81 e8" is the raw instruction representation. In powerpc, > >> this translates to instruction form: "ld RT,DS(RA)" and binary code > >> as: > >> _____________________________________ > >> | 58 | RT | RA | DS | | > >> ------------------------------------- > >> 0 6 11 16 30 31 > >> > >> Second patchset used "objdump" again to read the raw instruction. > >> But since there is no need to disassemble and binary code can be read > >> directly from the DSO, third patchset (ie this patchset) uses below > >> apporach. The apporach preferred in powerpc to parse sample for data > >> type profiling in V3 patchset is: > >> - Read directly from DSO using dso__data_read_offset > >> - If that fails for any case, fallback to using libcapstone > >> - If libcapstone is not supported, approach will use objdump > >> > >> Patchset adds support to pick the opcode and reg fields from this > >> raw/binary instruction code. This approach came in from review comment > >> by Segher Boessenkool and Christophe for the initial patchset. > >> > >> Apart from that, instruction tracking is enabled for powerpc and > >> support function is added to find variables defined as registers > >> Example, in powerpc, below two registers are > >> defined to represent variable: > >> 1. r13: represents local_paca > >> register struct paca_struct *local_paca asm("r13"); > >> > >> 2. r1: represents stack_pointer > >> register void *__stack_pointer asm("r1"); > >> > >> These are handled in this patchset. > >> > >> - Patch 1 is to rearrange register state type structures to header file > >> so that it can referred from other arch specific files > >> - Patch 2 is to make instruction tracking as a callback to"struct arch" > >> so that it can be implemented by other archs easily and defined in arch > >> specific files > >> - Patch 3 is to handle state type regs array size for x86 and powerpc > >> - Patch 4 adds support to capture and parse raw instruction in powerpc > >> using dso__data_read_offset utility > >> - Patch 4 also adds logic to support using objdump when doing default "perf > >> report" or "perf annotate" since it that needs disassembled instruction. > >> - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc > >> - Patch 6 update parameters for reg extract functions to use raw > >> instruction on powerpc > >> - Patch 7 updates ins__find to carry raw_insn and also adds parse > >> callback for memory instructions for powerpc > >> - Patch 8 add support to identify memory instructions of opcode 31 in > >> powerpc > >> - Patch 9 adds more instructions to support instruction tracking in powerpc > >> - Patch 10 and 11 handles instruction tracking for powerpc. > >> - Patch 12, 13 and 14 add support to use libcapstone in powerpc > >> - Patch 15 and patch 16 handles support to find global register variables > >> - PAtch 17 updates data type compare functions data_type_cmp and > >> sort__typeoff_sort to include var_name along with type_name in > >> comparison. > >> - Patch 18 handles insn-stat option for perf annotate > >> > >> Note: > >> - There are remaining unknowns (25%) as seen in annotate Instruction stats > >> below. > >> - This patchset is not tested on powerpc32. In next step of enhancements > >> along with handling remaining unknowns, plan to cover powerpc32 changes > >> based on how testing goes. > >> > >> With the current patchset: > >> > >> ./perf record -a -e mem-loads sleep 1 > >> ./perf report -s type,typeoff --hierarchy --group --stdio > >> ./perf annotate --data-type --insn-stat > >> > >> perf annotate logs: > >> ================== > >> > >> > >> Annotate Instruction stats > >> total 609, ok 446 (73.2%), bad 163 (26.8%) > >> > >> Name/opcode : Good Bad > >> ----------------------------------------------------------- > >> 58 : 323 80 > >> 32 : 49 43 > >> 34 : 33 11 > >> OP_31_XOP_LDX : 8 20 > >> 40 : 23 0 > >> OP_31_XOP_LWARX : 5 1 > >> OP_31_XOP_LWZX : 2 3 > >> OP_31_XOP_LDARX : 3 0 > >> 33 : 0 2 > >> OP_31_XOP_LBZX : 0 1 > >> OP_31_XOP_LWAX : 0 1 > >> OP_31_XOP_LHZX : 0 1 > >> > >> perf report logs: > >> ================= > >> > >> Total Lost Samples: 0 > >> > >> Samples: 1K of event 'mem-loads' > >> Event count (approx.): 937238 > >> > >> Overhead Data Type Data Type Offset > >> ........ ......... ................ > >> 48.60% (unknown) (unknown) +0 (no field) > >> 11.42% long unsigned int long unsigned int +0 (current_stack_pointer) > >> 4.68% struct paca_struct struct paca_struct +2312 (__current) > >> 4.57% struct paca_struct struct paca_struct +2354 (irq_soft_mask) > >> 2.69% struct paca_struct struct paca_struct +2808 (canary) > >> 2.68% struct paca_struct struct paca_struct +8 (paca_index) > >> 2.24% struct paca_struct struct paca_struct +48 (data_offset) > >> 1.43% long unsigned int long unsigned int +0 (no field) > >> 1.41% struct vm_fault struct vm_fault +0 (vma) > >> 1.29% struct task_struct struct task_struct +276 (flags) > >> 1.03% struct pt_regs struct pt_regs +264 (user_regs.msr) > >> 0.90% struct security_hook_list struct security_hook_list +0 > >> (list.next) > >> 0.76% struct irq_desc struct irq_desc +304 (irq_data.chip) > >> 0.76% struct rq struct rq +2856 (cpu) > >> 0.72% long long unsigned int long long unsigned int +0 (no field) > > > > Thanks for your work! But I think you need to split the basic part and > > global register support part which needs more review. > > > > For the patch 1 to 14: > > Reviewed-by: Namhyung Kim <namhy...@kernel.org> > > Hi Namhyung > > Thanks for all suggestions and reviews. I will check latest comments for > patches 15 and 16 (also patch 17 is dependent the global register support > part). But patch 18 is not dependent on global register support patches. > Along with patches 1 to 14, can you please add patch 18 also ?
Sure, feel free to add it to the patch 18. Reviewed-by: Namhyung Kim <namhy...@kernel.org> Thanks, Namhyung