Hi Tyrel, Tyrel Datwyler <[email protected]> writes: > On 6/19/20 8:34 AM, Scott Cheloha wrote: >> The H_GetPerformanceCounterInfo PHYP hypercall has a subcall, >> Affinity_Domain_Info_By_Partition, which returns, among other things, >> a "partition affinity score" for a given LPAR. This score, a value on >> [0-100], represents the processor-memory affinity for the LPAR in >> question. A score of 0 indicates the worst possible affinity while a >> score of 100 indicates perfect affinity. The score can be used to >> reason about performance. >> >> This patch adds the score for the local LPAR to the lparcfg procfile >> under a new 'partition_affinity_score' key. > > I expect that you will probably get a NACK from Michael on this. The overall > desire is to move away from these dated /proc interfaces. While its true that > I > did add a new value recently it was strictly to facilitate and correct the > calculation of a derived value that was already dependent on a couple other > existing values in lparcfg. > > With that said I would expect that you would likely be advised to expose this > as > a sysfs attribute. The question is where? We probably should put some thought > in > to this as I would like to port each lparcfg value over to sysfs so that we > can > move to deprecating lparcfg. Putting everything under something like > /sys/kernel/lparcfg/* maybe. Michael may have a better suggestion.
I think this score fits pretty naturally in lparcfg: it's a simple metric that is specific to the pseries/papr platform, like everything else in there. A few dozen key=value pairs contained in a single file is simple and efficient, unlike sysfs with its rather inconsistently applied one-value-per-file convention. Surely it's OK if lparcfg gains a line every few years? >> The H_GetPerformanceCounterInfo hypercall is already used elsewhere in >> the kernel, in powerpc/perf/hv-gpci.c. Refactoring that code and this >> code into a more general API might be worthwhile if additional modules >> require the hypercall in the future. > > If you are duplicating code its likely you should already be doing this. See > the > rest of my comments about below. > >> >> Signed-off-by: Scott Cheloha <[email protected]> >> --- >> arch/powerpc/platforms/pseries/lparcfg.c | 53 ++++++++++++++++++++++++ >> 1 file changed, 53 insertions(+) >> >> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c >> b/arch/powerpc/platforms/pseries/lparcfg.c >> index b8d28ab88178..b75151eee0f0 100644 >> --- a/arch/powerpc/platforms/pseries/lparcfg.c >> +++ b/arch/powerpc/platforms/pseries/lparcfg.c >> @@ -136,6 +136,57 @@ static unsigned int h_get_ppp(struct hvcall_ppp_data >> *ppp_data) >> return rc; >> } >> >> +/* >> + * Based on H_GetPerformanceCounterInfo v1.10. >> + */ >> +static void show_gpci_data(struct seq_file *m) >> +{ >> + struct perf_counter_info_params { >> + __be32 counter_request; >> + __be32 starting_index; >> + __be16 secondary_index; >> + __be16 returned_values; >> + __be32 detail_rc; >> + __be16 counter_value_element_size; >> + u8 counter_info_version_in; >> + u8 counter_info_version_out; >> + u8 reserved[0xC]; >> + } __packed; > > This looks to duplicate the hv_get_perf_counter_info_params struct from > arch/powerpc/perf/hv-gpci.h. Maybe this include file needs to move to > arch/powerpc/asm/inlcude so you don't have to redefine this struct. > >> + struct hv_gpci_request_buffer { >> + struct perf_counter_info_params params; >> + u8 output[4096 - sizeof(struct perf_counter_info_params)]; >> + } __packed; > > This struct is code duplication of the one defined in > arch/powerpc/perf/hv-gpci.c and could be moved into hv-gpci.h along with > HGPCI_MAX_DATA_BYTES so that you can use those versions here. I tend to agree with these comments. >> + struct hv_gpci_request_buffer *buf; >> + long ret; >> + unsigned int affinity_score; >> + >> + buf = kmalloc(sizeof(*buf), GFP_KERNEL); >> + if (buf == NULL) >> + return; >> + >> + /* >> + * Show the local LPAR's affinity score. >> + * >> + * 0xB1 selects the Affinity_Domain_Info_By_Partition subcall. >> + * The score is at byte 0xB in the output buffer. >> + */ >> + memset(&buf->params, 0, sizeof(buf->params)); >> + buf->params.counter_request = cpu_to_be32(0xB1); >> + buf->params.starting_index = cpu_to_be32(-1); /* local LPAR */ >> + buf->params.counter_info_version_in = 0x5; /* v5+ for score */ >> + ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, virt_to_phys(buf), >> + sizeof(*buf)); >> + if (ret != H_SUCCESS) { >> + pr_debug("hcall failed: H_GET_PERF_COUNTER_INFO: %ld, %x\n", >> + ret, be32_to_cpu(buf->params.detail_rc)); >> + goto out; >> + } >> + affinity_score = buf->output[0xB]; >> + seq_printf(m, "partition_affinity_score=%u\n", affinity_score); >> +out: >> + kfree(buf); >> +} >> + > > IIUC we should already be able to get this value from userspace using perf > tool, > right? If thats the case can't we also programatically retrieve it via the > perf_event interface in userspace as well? No... I had the same thought when I first looked at this, but perf is for sampling counters, and the partition affinity score is not a counter.
