Ira Weiny <[email protected]> writes: > On Sat, Sep 12, 2020 at 01:36:46AM +0530, Vaibhav Jain wrote: >> Thanks for reviewing this patch Ira, >> >> >> Ira Weiny <[email protected]> writes: >> >> > On Thu, Sep 10, 2020 at 02:52:12PM +0530, Vaibhav Jain wrote: >> >> A warning is reported by the kernel in case perf_stats_show() returns >> >> an error code. The warning is of the form below: >> >> >> >> papr_scm ibm,persistent-memory:ibm,pmemory@44100001: >> >> Failed to query performance stats, Err:-10 >> >> dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count >> >> fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count >> >> >> >> On investigation it looks like that the compiler is silently truncating >> >> the >> >> return value of drc_pmem_query_stats() from 'long' to 'int', since the >> >> variable used to store the return code 'rc' is an 'int'. This >> >> truncated value is then returned back as a 'ssize_t' back from >> >> perf_stats_show() to 'dev_attr_show()' which thinks of it as a large >> >> unsigned number and triggers this warning.. >> >> >> >> To fix this we update the type of variable 'rc' from 'int' to >> >> 'ssize_t' that prevents the compiler from truncating the return value >> >> of drc_pmem_query_stats() and returning correct signed value back from >> >> perf_stats_show(). >> >> >> >> Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance >> >> stats from PHYP') >> >> Signed-off-by: Vaibhav Jain <[email protected]> >> >> --- >> >> arch/powerpc/platforms/pseries/papr_scm.c | 3 ++- >> >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c >> >> b/arch/powerpc/platforms/pseries/papr_scm.c >> >> index a88a707a608aa..9f00b61676ab9 100644 >> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c >> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c >> >> @@ -785,7 +785,8 @@ static int papr_scm_ndctl(struct >> >> nvdimm_bus_descriptor *nd_desc, >> >> static ssize_t perf_stats_show(struct device *dev, >> >> struct device_attribute *attr, char *buf) >> >> { >> >> - int index, rc; >> >> + int index; >> >> + ssize_t rc; >> > >> > I'm not sure this is really fixing everything here. >> >> The issue is with the statement in perf_stats_show(): >> >> 'return rc ? rc : seq_buf_used(&s);' >> >> The function seq_buf_used() returns an 'unsigned int' and with 'rc' >> typed as 'int', forces a promotion of the expression to 'unsigned int' >> which causes a loss of signedness of 'rc' and compiler silently >> assigns this unsigned value to the function return typed as 'signed >> long'. >> >> Making 'rc', a 'signed long' forces a promotion of the expresion to >> 'signed long' which preserves the signedness of 'rc' and will also be >> compatible with the function return type. > > Ok, ok, I read this all wrong. > > FWIW I would also cast seq_buf_used() to ssize_t to show you know what you are > doing there. > >> >> > >> > drc_pmem_query_stats() can return negative errno's. Why are those not >> > checked >> > somewhere in perf_stats_show()? >> > >> For the specific invocation 'drc_pmem_query_stats(p, stats, 0)' we only >> expect return value 'rc <=0' with '0' indicating a successful fetch of >> nvdimm performance stats from hypervisor. Hence there are no explicit >> checks for negative error codes in the functions as all return values >> !=0 indicate an error. >> >> >> > It seems like all this fix is handling is a > 0 return value: 'ret[0]' from >> > line 289 in papr_scm.c... Or something? >> No, in case the arg 'num_stats' is '0' and 'buff_stats != NULL' the >> variable 'size' is assigned a non-zero value hence that specific branch >> you mentioned is never taken. Instead in case of success >> drc_pmem_query_stats() return '0' and in case of an error a negative >> error code is returned. >> >> > >> > Worse yet drc_pmem_query_stats() is returning ssize_t which is a signed >> > value. >> > Therefore, it should not be returning -errno. I'm surprised the static >> > checkers did not catch that. >> Didnt quite get the assertion here. The function is marked to return >> 'ssize_t' because we can return both +ve for success and -ve values to >> indicate errors. > > Sorry I was reading this as size_t and meant to say unsigned... I was looking > at this too quickly. > >> >> > >> > I believe I caught similar errors with a patch series before which did not >> > pay >> > attention to variable types. >> > >> > Please audit this code for these types of errors and ensure you are really >> > doing the correct thing when using the sysfs interface. I'm pretty sure >> > bad >> > things will eventually happen (if they are not already) if you return some >> > really big number to the sysfs core from *_show(). >> I think this problem is different compared to what you had previously pointed >> to. The values returned from drc_pmem_query_stats() can be stored in an >> 'int' variable too, however it was the silent promotion of a signed type >> to unsigned type was what caused this specific issue. > > Ok this makes more sense now. Sorry about not looking more carefully. > > But I still think matching up the return of seq_buf_used() is worth it. I > don't particularly like depending on 'automatic' promotions which make > reviewing code harder like this. Agree, I have sent out a v2 addressing this.
> > And sorry if my email seemed harsh I did not mean it to be. I just like when > types are more explicit because I feel like it can avoid issues like this. > (Specifically my confusion over the types...) > > :-D No worries :-) Thanks, -- Cheers ~ Vaibhav _______________________________________________ Linux-nvdimm mailing list -- [email protected] To unsubscribe send an email to [email protected]
