Hi James,

Thanks for the feedback.

>-----Original Message-----
>From: James Morse [mailto:[email protected]]
>Sent: 06 November 2020 19:34
>To: Shiju Jose <[email protected]>; [email protected];
>[email protected]; [email protected]; [email protected]; [email protected];
>[email protected]
>Cc: [email protected]; [email protected]; Linuxarm
><[email protected]>; Jonathan Cameron
><[email protected]>
>Subject: Re: [RFC PATCH 1/4] ACPI: PPTT: Fix for a high level cache node
>detected in the low level
>
>Hi Shiju, Jonathan,
>
>On 05/11/2020 17:42, Shiju Jose wrote:
>> From: Jonathan Cameron <[email protected]>
>>
>> According to the following sections of the PPTT definition in the ACPI
>> specification(V6.3), a high level cache node( For example L2 cache)
>> could be represented simultaneously both in the private resource of a
>> CPU node and via the next_level_of_cache pointer of a low level cache
>> node.
>> 1. Section 5.2.29.1 Processor hierarchy node structure (Type 0) "Each
>> processor node includes a list of resources that are private to that
>> node. Resources are described in other PPTT structures such as Type 1
>> cache structures. The processor node’s private resource list includes
>> a reference to each of the structures that represent private resources
>> to a given processor node. For example, an SoC level processor node
>> might contain two references, one pointing to a Level 3 cache resource
>> and another pointing to an ID structure."
>>
>> 2. Section 5.2.29.2 Cache Type Structure - Type 1
>>    Figure 5-26 Cache Type Structure - Type 1 Example
>
>'fix' in the subject makes me twitch ... is there a user-space visible bug
>because of this?
>
>
>> For the use case of creating EDAC device blocks for the CPU caches, we
>> need to search for cache node types in all levels using
>> acpi_find_cache_node(), as a platform independent solution to
>
>I'm nervous to base the edac user-space view of caches on something other
>than what is described in /sys/devices/system/cpu/cpu0/cache. These things
>have to match, otherwise user-space can't work out which cpu's L2's it should
>add to get the value for the physical cache.
With this fix the /sys/devices/system/edac/cpu/cpu0/cacheN match with  
/sys/devices/system/cpu/cpu0/cache/indexN.
and thus user-space could extract cpu list for the shared caches.

>
>Getting the data from somewhere else risks making this more complicated.
>
>Using the PPTT means this won't work on "HPE Server"s that use ghes_edac
>too. I don't think we should have any arm64 specific behaviour here.
>
>
>> retrieve the cache info from the ACPI PPTT. The reason is that
>> cacheinfo in the drivers/base/cacheinfo.c would not be populated in
>> this stage.
>
>Because both ghes_init() and cacheinfo_sysfs_init() are device_initcall()?
>
>Couldn't we fix this by making ghes_init(), device_initcall_sync() (with a
>comment saying what it depends on)

I checked by making ghes_init(), device_initcall_sync(). Then the
ghes_probe() and ghes_edac_register() are getting
called after detect_cache_attributes() of base/cacheinfo.c function is called, 
where per_cpu_cacheinfo is allocated and populated for the cpu online. 
Thus cacheinfo data is available for the online CPUs.

>
>
>I agree this means dealing with cpuhp as the cacheinfo data is only available
>for online CPUs.

Does this require the EDAC device instance for a cpu become online/offline to 
be added/deleted
on cpuhp notify functions in the ghes_edac because the cache structure among 
CPU cores would vary?
If so, I think edac device does not support dynamic addition/deletion of a 
device instance because
edac_device_alloc_ctl_info() pre-allocates memory for the internal edac dev 
structures for the number of 
instances(number of CPUs) and number of blocks(number of Caches) passed?

>
>
>> In this case, we found acpi_find_cache_node() mistakenly detecting
>> high level cache as low level cache, when the cache node is in the
>> processor node’s private resource list.
>>
>> To fix this issue add duplication check in the
>> acpi_find_cache_level(), for a cache node found in the private
>> resource of a CPU node with all the next level of caches present in the other
>cache nodes.
>
>I'm not overly familiar with the PPTT, is it possible this issue is visible in
>/sys/devices/system/cpu/cpu0/cache?
>
>
>Thanks,
>
>James
>
>
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c index
>> 4ae93350b70d..de1dd605d3ad 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -132,21 +132,80 @@ static unsigned int acpi_pptt_walk_cache(struct
>acpi_table_header *table_hdr,
>>      return local_level;
>>  }
>>
>> +/**
>> + * acpi_pptt_walk_check_duplicate() - Find the cache resource to
>> +check,
>> + * is a duplication in the next_level_of_cache pointer of other cache.
>> + * @table_hdr: Pointer to the head of the PPTT table
>> + * @res: cache resource in the PPTT we want to walk
>> + * @res_check: cache resource in the PPTT we want to check for
>duplication.
>> + *
>> + * Given both PPTT resource, verify that they are cache nodes, then
>> +walk
>> + * down each level of cache @res, and check for the duplication.
>> + *
>> + * Return: true if duplication found, false otherwise.
>> + */
>> +static bool acpi_pptt_walk_check_duplicate(struct acpi_table_header
>*table_hdr,
>> +                                       struct acpi_subtable_header *res,
>> +                                       struct acpi_subtable_header
>*res_check) {
>> +    struct acpi_pptt_cache *cache;
>> +    struct acpi_pptt_cache *check;
>> +
>> +    if (res->type != ACPI_PPTT_TYPE_CACHE ||
>> +        res_check->type != ACPI_PPTT_TYPE_CACHE)
>> +            return false;
>> +
>> +    cache = (struct acpi_pptt_cache *)res;
>> +    check = (struct acpi_pptt_cache *)res_check;
>> +    while (cache) {
>> +            if (cache == check)
>> +                    return true;
>> +            cache = fetch_pptt_cache(table_hdr, cache-
>>next_level_of_cache);
>> +    }
>> +
>> +    return false;
>> +}
>> +
>>  static struct acpi_pptt_cache *
>>  acpi_find_cache_level(struct acpi_table_header *table_hdr,
>>                    struct acpi_pptt_processor *cpu_node,
>>                    unsigned int *starting_level, unsigned int level,
>>                    int type)
>>  {
>> -    struct acpi_subtable_header *res;
>> +    struct acpi_subtable_header *res, *res2;
>>      unsigned int number_of_levels = *starting_level;
>>      int resource = 0;
>> +    int resource2 = 0;
>> +    bool duplicate = false;
>>      struct acpi_pptt_cache *ret = NULL;
>>      unsigned int local_level;
>>
>>      /* walk down from processor node */
>>      while ((res = acpi_get_pptt_resource(table_hdr, cpu_node,
>resource))) {
>>              resource++;
>> +            /*
>> +             * PPTT definition in the ACPI specification allows a high level
>cache
>> +             * node would be represented simultaneously both in the
>private resource
>> +             * of a CPU node and via the next_level_of_cache pointer of
>another cache node,
>> +             * within the same CPU hierarchy. This resulting
>acpi_find_cache_level()
>> +             * mistakenly detects a higher level cache node in the low
>level as well.
>> +             *
>> +             * Check a cache node in the private resource of the CPU node
>for any
>> +             * duplication.
>> +             */
>> +            resource2 = 0;
>> +            duplicate = false;
>> +            while ((res2 = acpi_get_pptt_resource(table_hdr, cpu_node,
>resource2))) {
>> +                    resource2++;
>> +                    if (res2 == res)
>> +                            continue;
>> +                    if (acpi_pptt_walk_check_duplicate(table_hdr, res2,
>res)) {
>> +                            duplicate = true;
>> +                            break;
>> +                    }
>> +            }
>> +            if (duplicate)
>> +                    continue;
>>
>>              local_level = acpi_pptt_walk_cache(table_hdr,
>*starting_level,
>>                                                 res, &ret, level, type);
>>

Thanks,
Shiju

Reply via email to