On 2024/8/29 20:46, Pierre Gondois wrote: > Hello Yicong, > > On 8/29/24 09:40, Yicong Yang wrote: >> Hi Pierre, >> >> On 2024/8/27 23:40, Pierre Gondois wrote: >>> Hello Yicong, >>> IIUC we are looking for the maximum number of threads a CPU can have >>> in the platform. But is is actually possible to have a platform with >>> CPUs not having the same number of threads ? >>> >> >> I was thinking of the heterogenous platforms, for example part of the >> cores have SMT and others don't (I don't have any though, but there >> should be some such platforms for arm64). >> >>> For instance kernel/cpu.c::cpu_smt_num_threads_valid() will check >>> that the number of threads is either 1 or cpu_smt_max_threads (as >>> CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled for arm64). However >>> a (hypothetical) platform with: >>> - CPU0: 2 threads >>> - CPU1: 4 threads >>> should not be able to set the number of threads to 2, as >>> 1 < 2 < cpu_smt_max_threads (cf. cpu_smt_num_threads_valid()). >>> >> >> It's a bit more complex. If we enable/disable the SMT using on/off control >> then we won't have this problem. We'll online all the CPUs on enabling and >> offline CPUs which is not primary thread and don't care about the thread >> number of each core. >> >> Control using thread number is introduced by CONFIG_SMT_NUM_THREADS_DYNAMIC >> and only enabled on powerpc. I think this interface is not enough to handle >> the hypothetical we assumed, since it assumes the homogenous platform that >> all the CPUs have the same thread number. But this should be fine for >> the platforms with SMT(with same thread number) and non-SMT cores, since it >> do >> indicates the real thread number of the SMT cores. >> >>> So if there is an assumption that all the CPUs have the same number of >>> threads, it should be sufficient to count the number of threads for one >>> CPU right ? >>> >> >> Naturally and conveniently I may think use of the threads number of CPU0 >> (boot >> cpu) in such a solution. But this will be wrong if CPU0 is non-SMT on a >> heterogenous >> platform :( >> >>> In the other (unlikely) case (i.e. CPUs can have various number of threads), >>> I think we should either: >>> - use your method and check that all the CPUs have the same number of >>> threads >>> - get the number of threads for one CPU and check that all the CPUs are >>> identical using the ACPI_PPTT_ACPI_IDENTICAL tag >> >> I think this won't be simpler since we still need to iterate all the CPUs to >> see >> if they have the same hetero_id (checked how we're using this ACPI tag in >> arm_acpi_register_pmu_device()). We could already know if they're identical >> by >> comparing the thread number and do the update if necessary. >> >>> - have a per-cpu cpu_smt_max_threads value ? >> >> This should be unnecessary in currently stage if there's no platforms >> with several kind cores have different thread number like in your assumption >> and enable CONFIG_SMT_NUM_THREADS_DYNAMIC on such platforms. Otherwise using >> a global cpu_smt_max_threads to enable the SMT control should be enough. >> Does it make sense? > > Yes, I agree with all the things you said: > - the current smt/control interface cannot handle asymmetric SMT platforms > - I am not aware if such platform exist so far > > I think it would still be good to check all the CPUs have the same number of > threads. I tried to enable the SMT_NUM_THREADS_DYNAMIC Kconfig with the > patch attached (and to check CPUs have the same number of threads). Feel free > to take what you like/ignore what is inside the attached patch, or let me know > if I should submit a part in a separate patchset, >
Checked the changes, we can make this series as the basic support and a separate series of yours as a further support of SMT control on arm64: - support thread_id on ACPI based arm64 platform - support partial SMT control by enable CONFIG_SMT_NUM_THREADS_DYNAMIC some minor comments below. > ---------------------------- > > [RFC] arm64: topology: Enable CONFIG_SMT_NUM_THREADS_DYNAMIC > - On arm64 ACPI systems, change the thread_id assignment to have > increasing values starting from 0. This is already the case for DT > based systems. Doing so allows to uniformly identify the n-th > thread of a given CPU. > - Check that all CPUs have the same number of threads (for DT/ACPI) > - Enable CONFIG_SMT_NUM_THREADS_DYNAMIC > On a Tx2, with 256 CPUs, threads siblings being 0,32,64,96 > for socket0 and 128 + (0,32,64,96) for socket1: > $ cd /sys/devices/system/cpu/smt/ > $ cat ../online > 0-255 > $ echo 2 > control > $ cat ../online > 0-63,128-191 > $ echo 3 > control > $ cat ../online > 0-95,128-223 > $ echo on > control > $ cat ../online > 0-255 > So it's a real SMT4 system, it does make sense to have this partial SMT control support. > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index bd3bc2f5e0ec..1d8521483065 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -239,6 +239,7 @@ config ARM64 > select HAVE_GENERIC_VDSO > select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU > select HOTPLUG_SMT if (SMP && HOTPLUG_CPU) > + select SMT_NUM_THREADS_DYNAMIC if HOTPLUG_SMT > select IRQ_DOMAIN > select IRQ_FORCED_THREADING > select KASAN_VMALLOC if KASAN > diff --git a/arch/arm64/include/asm/topology.h > b/arch/arm64/include/asm/topology.h > index 0f6ef432fb84..7dd211f81687 100644 > --- a/arch/arm64/include/asm/topology.h > +++ b/arch/arm64/include/asm/topology.h > @@ -39,6 +39,14 @@ void update_freq_counters_refs(void); > #define arch_scale_hw_pressure topology_get_hw_pressure > #define arch_update_hw_pressure topology_update_hw_pressure > > +#ifdef CONFIG_SMT_NUM_THREADS_DYNAMIC > +#include <linux/cpu_smt.h> > +static inline bool topology_smt_thread_allowed(unsigned int cpu) > +{ > + return topology_thread_id(cpu) < cpu_smt_num_threads; > +} > +#endif > + > #include <asm-generic/topology.h> > > #endif /* _ASM_ARM_TOPOLOGY_H */ > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c > index f72e1e55b05e..a83babe19972 100644 > --- a/arch/arm64/kernel/topology.c > +++ b/arch/arm64/kernel/topology.c > @@ -47,7 +47,9 @@ int __init parse_acpi_topology(void) > { > int thread_num, max_smt_thread_num = 1; > struct xarray core_threads; > + bool have_thread = false; > int cpu, topology_id; > + unsigned long i; > void *entry; > > if (acpi_disabled) > @@ -61,6 +63,8 @@ int __init parse_acpi_topology(void) > return topology_id; > > if (acpi_cpu_is_threaded(cpu)) { > + have_thread = true; > + > cpu_topology[cpu].thread_id = topology_id; > topology_id = find_acpi_cpu_topology(cpu, 1); > cpu_topology[cpu].core_id = topology_id; > @@ -69,9 +73,10 @@ int __init parse_acpi_topology(void) > if (!entry) { > xa_store(&core_threads, topology_id, > xa_mk_value(1), GFP_KERNEL); > + cpu_topology[cpu].thread_id = 0; > } else { > thread_num = xa_to_value(entry); > - thread_num++; > + cpu_topology[cpu].thread_id = thread_num++; > xa_store(&core_threads, topology_id, > xa_mk_value(thread_num), GFP_KERNEL); > > @@ -86,8 +91,17 @@ int __init parse_acpi_topology(void) > cpu_topology[cpu].cluster_id = topology_id; > topology_id = find_acpi_cpu_topology_package(cpu); > cpu_topology[cpu].package_id = topology_id; > + > + pr_debug("CPU%u: package=0x%x cluster=0x%x core=0x%x > thread=0x%x\n", > + cpu, cpu_topology[cpu].package_id, > cpu_topology[cpu].cluster_id, > + cpu_topology[cpu].core_id, > cpu_topology[cpu].thread_id); > } > > + if (have_thread) we could know this from max_smt_thread_num so have_thread maybe unnecessary. > + xa_for_each(&core_threads, i, entry) > + if (xa_to_value(entry) != max_smt_thread_num) > + pr_warn("Heterogeneous SMT topology not > handled");\ Wondering if we can avoid this 2nd loop. Greg express the worries of looping twice on large scale system in v1. Maybe we could use the hetero_id and get the necessary information in one loop, I need further think. Thanks. > + > cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num); > > xa_destroy(&core_threads); > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > index 95513abd664f..20d7f5b72ddd 100644 > --- a/drivers/base/arch_topology.c > +++ b/drivers/base/arch_topology.c > @@ -532,13 +532,15 @@ static int __init get_cpu_for_node(struct device_node > *node) > return cpu; > } > > -static void __init update_smt_num_threads(unsigned int num_threads) > +static void __init update_smt_num_threads(int num_threads) > { > - static unsigned int max_smt_thread_num = 1; > + static int max_smt_thread_num = -1; > > - if (num_threads > max_smt_thread_num) { > + if (max_smt_thread_num < 0) { > max_smt_thread_num = num_threads; > cpu_smt_set_num_threads(max_smt_thread_num, > max_smt_thread_num); > + } else if (num_threads != max_smt_thread_num) { > + pr_warn("Heterogeneous SMT topology not handled"); > } > } > > diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h > index b721f360d759..afdfdc64a0a1 100644 > --- a/include/linux/arch_topology.h > +++ b/include/linux/arch_topology.h > @@ -87,6 +87,7 @@ extern struct cpu_topology cpu_topology[NR_CPUS]; > #define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id) > #define topology_cluster_id(cpu) (cpu_topology[cpu].cluster_id) > #define topology_core_id(cpu) (cpu_topology[cpu].core_id) > +#define topology_thread_id(cpu) (cpu_topology[cpu].thread_id) > #define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling) > #define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling) > #define topology_cluster_cpumask(cpu) (&cpu_topology[cpu].cluster_sibling) > > ---------------------------- > > > Regards, > Pierre > >> >> Thanks, >> Yicong >> >>> >>> Same comment for the DT patch. If there is an assumption that all CPUs have >>> the same number of threads, then update_smt_num_threads() could only be >>> called >>> once I suppose, >>> >>> Regards, >>> Pierre >>> >>> >>> On 8/6/24 10:53, Yicong Yang wrote: >>>> From: Yicong Yang <yangyic...@hisilicon.com> >>>> >>>> For ACPI we'll build the topology from PPTT and we cannot directly >>>> get the SMT number of each core. Instead using a temporary xarray >>>> to record the SMT number of each core when building the topology >>>> and we can know the largest SMT number in the system. Then we can >>>> enable the support of SMT control. >>>> >>>> Signed-off-by: Yicong Yang <yangyic...@hisilicon.com> >>>> --- >>>> arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++ >>>> 1 file changed, 24 insertions(+) >>>> >>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c >>>> index 1a2c72f3e7f8..f72e1e55b05e 100644 >>>> --- a/arch/arm64/kernel/topology.c >>>> +++ b/arch/arm64/kernel/topology.c >>>> @@ -15,8 +15,10 @@ >>>> #include <linux/arch_topology.h> >>>> #include <linux/cacheinfo.h> >>>> #include <linux/cpufreq.h> >>>> +#include <linux/cpu_smt.h> >>>> #include <linux/init.h> >>>> #include <linux/percpu.h> >>>> +#include <linux/xarray.h> >>>> #include <asm/cpu.h> >>>> #include <asm/cputype.h> >>>> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu) >>>> */ >>>> int __init parse_acpi_topology(void) >>>> { >>>> + int thread_num, max_smt_thread_num = 1; >>>> + struct xarray core_threads; >>>> int cpu, topology_id; >>>> + void *entry; >>>> if (acpi_disabled) >>>> return 0; >>>> + xa_init(&core_threads); >>>> + >>>> for_each_possible_cpu(cpu) { >>>> topology_id = find_acpi_cpu_topology(cpu, 0); >>>> if (topology_id < 0) >>>> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void) >>>> cpu_topology[cpu].thread_id = topology_id; >>>> topology_id = find_acpi_cpu_topology(cpu, 1); >>>> cpu_topology[cpu].core_id = topology_id; >>>> + >>>> + entry = xa_load(&core_threads, topology_id); >>>> + if (!entry) { >>>> + xa_store(&core_threads, topology_id, >>>> + xa_mk_value(1), GFP_KERNEL); >>>> + } else { >>>> + thread_num = xa_to_value(entry); >>>> + thread_num++; >>>> + xa_store(&core_threads, topology_id, >>>> + xa_mk_value(thread_num), GFP_KERNEL); >>>> + >>>> + if (thread_num > max_smt_thread_num) >>>> + max_smt_thread_num = thread_num; >>>> + } >>>> } else { >>>> cpu_topology[cpu].thread_id = -1; >>>> cpu_topology[cpu].core_id = topology_id; >>>> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void) >>>> cpu_topology[cpu].package_id = topology_id; >>>> } >>>> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num); >>>> + >>>> + xa_destroy(&core_threads); >>>> return 0; >>>> } >>>> #endif >>> >>> .