Hi Sebastien, Super super interesting stuff!
On Tue, Apr 11, 2023 at 09:28:31AM +0200, Sebastien Marie wrote: > ## compile and collect profil information (-tu option on ktrace is optional) > $ cc -static -pg test.c > $ ktrace -di -tu ./a.out > > ## get gmon.out file > $ kdump -u gmon.out | unvis > gmon.out > > ## get gmon.out.$name.$pid for multiple processes > ## - first get pid process-name > ## - extract each gmon.out for each pid and store in "gmon.out.$name.$pid" > file > $ kdump -tu | sed -ne 's/^ \([0-9][0-9]*\) \([^ ]*\) .*/\1 \2/p' | sort -u \ > | while read pid name; do kdump -u gmon.out -p $pid | unvis > > gmon.out.$name.$pid ; done > > kdump diff from otto@ mallocdump is need for 'kdump -u label'. > > Feedback would be appreciated. I used your diff to get more insight into the performance profile of rpki-client. I always considered profiling rpki-client a bit tricky because of multiple processes (privsep) and pledge, but with your diff it suddenly became quite easy. I profiled a workload that's easy to reliably reproduce. Then I used gprof2dot to make images with and without a diff intended to improve performance (the diff reduces duplicate work being done in the X.509 validator). With https://marc.info/?l=openbsd-tech&m=168354732821734&w=2 applied: main process: https://sobornost.net/~job/main.png parser process: https://sobornost.net/~job/parser.png Without that flags |= X509_V_FLAG_PARTIAL_CHAIN diff: https://sobornost.net/~job/main-without-partialchains.png https://sobornost.net/~job/parser-without-partialchains.png Look for the rectangle that says 'addr_contains' (this is a function in lib/libcrypto/x509/x509_addr.c). One can see that tb@'s diff reduced use of 'addr_contains' from 17.06%/5.03% down to 5.05%/0.59%. Its very handy to be able to confirm that there was an improvement in the area where the diff was expected to cause an improvement (on top of confirming performance improvement in overall runtime). Thanks for sharing this! I hope it lands in base at some point. Kind regards, Job