Hi Tyrel,

Tyrel Datwyler <[email protected]> writes:
> On 6/19/20 8:34 AM, Scott Cheloha wrote:
>> The H_GetPerformanceCounterInfo PHYP hypercall has a subcall,
>> Affinity_Domain_Info_By_Partition, which returns, among other things,
>> a "partition affinity score" for a given LPAR.  This score, a value on
>> [0-100], represents the processor-memory affinity for the LPAR in
>> question.  A score of 0 indicates the worst possible affinity while a
>> score of 100 indicates perfect affinity.  The score can be used to
>> reason about performance.
>> 
>> This patch adds the score for the local LPAR to the lparcfg procfile
>> under a new 'partition_affinity_score' key.
>
> I expect that you will probably get a NACK from Michael on this. The overall
> desire is to move away from these dated /proc interfaces. While its true that 
> I
> did add a new value recently it was strictly to facilitate and correct the
> calculation of a derived value that was already dependent on a couple other
> existing values in lparcfg.
>
> With that said I would expect that you would likely be advised to expose this 
> as
> a sysfs attribute. The question is where? We probably should put some thought 
> in
> to this as I would like to port each lparcfg value over to sysfs so that we 
> can
> move to deprecating lparcfg. Putting everything under something like
> /sys/kernel/lparcfg/* maybe. Michael may have a better suggestion.

I think this score fits pretty naturally in lparcfg: it's a simple
metric that is specific to the pseries/papr platform, like everything
else in there.

A few dozen key=value pairs contained in a single file is simple and
efficient, unlike sysfs with its rather inconsistently applied
one-value-per-file convention. Surely it's OK if lparcfg gains a line
every few years?


>> The H_GetPerformanceCounterInfo hypercall is already used elsewhere in
>> the kernel, in powerpc/perf/hv-gpci.c.  Refactoring that code and this
>> code into a more general API might be worthwhile if additional modules
>> require the hypercall in the future.
>
> If you are duplicating code its likely you should already be doing this. See 
> the
> rest of my comments about below.
>
>> 
>> Signed-off-by: Scott Cheloha <[email protected]>
>> ---
>>  arch/powerpc/platforms/pseries/lparcfg.c | 53 ++++++++++++++++++++++++
>>  1 file changed, 53 insertions(+)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
>> b/arch/powerpc/platforms/pseries/lparcfg.c
>> index b8d28ab88178..b75151eee0f0 100644
>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>> @@ -136,6 +136,57 @@ static unsigned int h_get_ppp(struct hvcall_ppp_data 
>> *ppp_data)
>>      return rc;
>>  }
>>  
>> +/*
>> + * Based on H_GetPerformanceCounterInfo v1.10.
>> + */
>> +static void show_gpci_data(struct seq_file *m)
>> +{
>> +    struct perf_counter_info_params {
>> +            __be32 counter_request;
>> +            __be32 starting_index;
>> +            __be16 secondary_index;
>> +            __be16 returned_values;
>> +            __be32 detail_rc;
>> +            __be16 counter_value_element_size;
>> +            u8     counter_info_version_in;
>> +            u8     counter_info_version_out;
>> +            u8     reserved[0xC];
>> +    } __packed;
>
> This looks to duplicate the hv_get_perf_counter_info_params struct from
> arch/powerpc/perf/hv-gpci.h. Maybe this include file needs to move to
> arch/powerpc/asm/inlcude so you don't have to redefine this struct.
>
>> +    struct hv_gpci_request_buffer {
>> +            struct perf_counter_info_params params;
>> +            u8 output[4096 - sizeof(struct perf_counter_info_params)];
>> +    } __packed;
>
> This struct is code duplication of the one defined in
> arch/powerpc/perf/hv-gpci.c and could be moved into hv-gpci.h along with
> HGPCI_MAX_DATA_BYTES so that you can use those versions here.

I tend to agree with these comments.


>> +    struct hv_gpci_request_buffer *buf;
>> +    long ret;
>> +    unsigned int affinity_score;
>> +
>> +    buf = kmalloc(sizeof(*buf), GFP_KERNEL);
>> +    if (buf == NULL)
>> +            return;
>> +
>> +    /*
>> +     * Show the local LPAR's affinity score.
>> +     *
>> +     * 0xB1 selects the Affinity_Domain_Info_By_Partition subcall.
>> +     * The score is at byte 0xB in the output buffer.
>> +     */
>> +    memset(&buf->params, 0, sizeof(buf->params));
>> +    buf->params.counter_request = cpu_to_be32(0xB1);
>> +    buf->params.starting_index = cpu_to_be32(-1);   /* local LPAR */
>> +    buf->params.counter_info_version_in = 0x5;      /* v5+ for score */
>> +    ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, virt_to_phys(buf),
>> +                             sizeof(*buf));
>> +    if (ret != H_SUCCESS) {
>> +            pr_debug("hcall failed: H_GET_PERF_COUNTER_INFO: %ld, %x\n",
>> +                     ret, be32_to_cpu(buf->params.detail_rc));
>> +            goto out;
>> +    }
>> +    affinity_score = buf->output[0xB];
>> +    seq_printf(m, "partition_affinity_score=%u\n", affinity_score);
>> +out:
>> +    kfree(buf);
>> +}
>> +
>
> IIUC we should already be able to get this value from userspace using perf 
> tool,
> right? If thats the case can't we also programatically retrieve it via the
> perf_event interface in userspace as well?

No... I had the same thought when I first looked at this, but perf is
for sampling counters, and the partition affinity score is not a
counter.

Reply via email to