Shrikanth Hegde <sshe...@linux.ibm.com> writes: > On 4/5/24 6:19 PM, Nathan Lynch wrote: >> Shrikanth Hegde <sshe...@linux.ibm.com> writes: > > Hi Nathan, Thanks for reviewing this. > >>> When there are no options specified for lparstat, it is expected to >>> give reports since LPAR(Logical Partition) boot. App is an indicator >>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is >>> derived using pool_idle_time which is obtained using H_PIC call. >> >> If "App" is short for "Available Procesoor Pool" then it should be >> written "APP" and the it should be introduced and defined more clearly >> than this. >> > > Ok.I reworded it for v2. > > yes APP is Available Processor Pool. > >> >>> The interval based reports show correct App value while since boot >>> report shows very high App values. This happens because in that case app >>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle >>> time is reported by the PowerVM hypervisor since its boot, it need not >>> align with LPAR boot. This leads to large App values. >>> >>> To fix that export boot pool idle time in lparcfg and powerpc-utils will >>> use this info to derive App as below for since boot reports. >>> >>> App = (pool idle time - boot pool idle time) / (uptime * timebase) >>> >>> Results:: Observe app values. >>> ====================== Shared LPAR ================================ >>> lparstat >>> System Configuration >>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00 >>> >>> reboot >>> stress-ng --cpu=$(nproc) -t 600 >>> sleep 600 >>> So in this case app is expected to close to 37-6=31. >>> >>> ====== 6.9-rc1 and lparstat 1.3.10 ============= >>> %user %sys %wait %idle physc %entc lbusy app vcsw phint >>> ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- >>> 47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21 >>> >>> === With this patch and powerpc-utils patch to do the above equation === >>> %user %sys %wait %idle physc %entc lbusy app vcsw phint >>> ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- >>> 47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21 >>> ===================================================================== >>> >>> Note: physc, purr/idle purr being inaccurate is being handled in a >>> separate patch in powerpc-utils tree. >>> >>> Signed-off-by: Shrikanth Hegde <sshe...@linux.ibm.com> >>> --- >>> Note: >>> >>> This patch needs to merged first in the kernel for the powerpc-utils >>> patches to work. powerpc-utils patches will be posted to its mailing >>> list and link would be found in the reply to this patch if available. >>> >>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c >>> b/arch/powerpc/platforms/pseries/lparcfg.c >>> index f73c4d1c26af..8df4e7c529d7 100644 >>> --- a/arch/powerpc/platforms/pseries/lparcfg.c >>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c >>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time, >>> return rc; >>> } >>> >>> +unsigned long boot_pool_idle_time; >> >> Should be static, and u64. Better to use explicitly sized types for data >> at the kernel-hypervisor boundary. > > Current usage of h_pic doesn't follow this either. Are you suggesting we > change that > as well?
Yes pretty much. h_pic() as currently written and used has some problems: static unsigned h_pic(unsigned long *pool_idle_time, unsigned long *num_procs) { unsigned long rc; unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; rc = plpar_hcall(H_PIC, retbuf); *pool_idle_time = retbuf[0]; *num_procs = retbuf[1]; return rc; } * Coerces the return value of plpar_hcall() to unsigned -- hcall errors are negative. * Assigns *pool_idle_time and *num_procs using uninitialized data when H_PIC is unsuccessful. * Assigns the outparams unconditionally; would be nicer if it allowed callers to pass NULL so they don't have to provide dummy inputs that aren't even used, as in your change. * Should follow Linux -errno return value convention in the absence of a need for the specific hcall status in its callers. >>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void) >>> printk(KERN_ERR "Failed to create powerpc/lparcfg\n"); >>> return -EIO; >>> } >>> + >>> + h_pic(&boot_pool_idle_time, &num_procs); >> >> h_pic() can fail, leaving the out parameters uninitialized. > > Naveen pointed to me this a while ago, but I forgot. > > Currently h_pic return value is not checked at all, either at boor or at > runtime. > When it fails, should we re-try or just print a kernel debug? What would be > expected > behavior? because if it fails, it would anyway result in wrong values of app > even > if the variables are initialized to 0. There's nothing in the spec for H_PIC that suggests retrying on failure. I'm not really in favor of printing a message about it either; that practice tends to result in log noise in non-PowerVM guests. When H_PIC doesn't work the corresponding lines should not be emitted in /proc/powerpc/lparcfg IMO.