On 11/20/2018 10:12 AM, Qian Cai wrote: > >> On Nov 20, 2018, at 8:50 AM, Waiman Long <long...@redhat.com> wrote: >> >> On 11/20/2018 01:42 AM, Qian Cai wrote: >>> The current value of the early boot static pool size is not big enough >>> for systems with large number of CPUs with timer or/and workqueue >>> objects selected. As the results, systems have 60+ CPUs with both timer >>> and workqueue objects enabled could trigger "ODEBUG: Out of memory. >>> ODEBUG disabled". Hence, fixed it by computing it according to >>> CONFIG_NR_CPUS and CONFIG_DEBUG_OBJECTS_* options. >>> >>> Signed-off-by: Qian Cai <c...@gmx.us> >>> --- >>> lib/debugobjects.c | 53 +++++++++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 52 insertions(+), 1 deletion(-) >>> >>> diff --git a/lib/debugobjects.c b/lib/debugobjects.c >>> index 70935ed91125..372dc34206d5 100644 >>> --- a/lib/debugobjects.c >>> +++ b/lib/debugobjects.c >>> @@ -23,7 +23,53 @@ >>> #define ODEBUG_HASH_BITS 14 >>> #define ODEBUG_HASH_SIZE (1 << ODEBUG_HASH_BITS) >>> >>> +/* >>> + * Some debug objects are allocated during the early boot. Enabling some >>> + * options like timers or workqueue objects may increase the size required >>> + * significantly with large number of CPUs. For example, >>> + * >>> + * No. CPUs x 2 (worker pool) objects: >>> + * >>> + * start_kernel >>> + * workqueue_init_early >>> + * init_worker_pool >>> + * init_timer_key >>> + * debug_object_init >>> + * >>> + * No. CPUs objects (CONFIG_HIGH_RES_TIMERS): >>> + * >>> + * sched_init >>> + * hrtick_rq_init >>> + * hrtimer_init >>> + * >>> + * CONFIG_DEBUG_OBJECTS_WORK: >>> + * No. CPUs x 6 (workqueue) objects: >>> + * >>> + * workqueue_init_early >>> + * alloc_workqueue >>> + * __alloc_workqueue_key >>> + * alloc_and_link_pwqs >>> + * init_pwq >>> + * >>> + * Also, plus No. CPUs objects: >>> + * >>> + * perf_event_init >>> + * __init_srcu_struct >>> + * init_srcu_struct_fields >>> + * init_srcu_struct_nodes >>> + * __init_work >>> + * >>> + * Increase the number a bit more in case the implmentatins are changed in >>> + * the future. >>> + */ >>> +#if defined(CONFIG_NR_CPUS) && defined(CONFIG_DEBUG_OBJECTS_TIMERS) && \ >>> +!defined(CONFIG_DEBUG_OBJECTS_WORK) >>> +#define ODEBUG_POOL_SIZE (CONFIG_NR_CPUS * 10) >>> +#elif defined(CONFIG_NR_CPUS) && defined(CONFIG_DEBUG_OBJECTS_WORK) >>> +#define ODEBUG_POOL_SIZE (CONFIG_NR_CPUS * 30) >>> +#else >>> #define ODEBUG_POOL_SIZE 1024 >>> +#endif /* CONFIG_NR_CPUS */ >>> #define ODEBUG_POOL_MIN_LEVEL 256 >>> >> CONFIG_NR_CPUS is always defined. You don't need to put that as a #if >> condition. Where does the scaling factor 30 come from? It looks high to me. > Hmm, looks like some architectures could have it undefined since it depends > on CONFIG_SMP where the later can be disabled. For example alpha, > > config NR_CPUS > int "Maximum number of CPUs (2-32)" > range 2 32 > depends on SMP include/linux/threads.h: #ifndef CONFIG_NR_CPUS /* FIXME: This should be fixed in the arch's Kconfig */ #define CONFIG_NR_CPUS 1 #endif
> Scaling factor 30 came from the data, with all the debug_objects options > enabled, I have, > > 64-CPU: ODEBUG: 1114 of 1114 active objects replaced > 256-CPU: ODEBUG: 4378 of 4378 active objects replaced > > I also give a bit room for growth in the future since the implementation > details > could always change. (4378-1114)/(256-64) = 17 So the max scaling factor is 17. I would say you could round it up to 20 at most. Cheers, Longman