On Sat, Sep 12, 2020 at 01:36:46AM +0530, Vaibhav Jain wrote: > Thanks for reviewing this patch Ira, > > > Ira Weiny <ira.we...@intel.com> writes: > > > On Thu, Sep 10, 2020 at 02:52:12PM +0530, Vaibhav Jain wrote: > >> A warning is reported by the kernel in case perf_stats_show() returns > >> an error code. The warning is of the form below: > >> > >> papr_scm ibm,persistent-memory:ibm,pmemory@44100001: > >> Failed to query performance stats, Err:-10 > >> dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count > >> fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count > >> > >> On investigation it looks like that the compiler is silently truncating the > >> return value of drc_pmem_query_stats() from 'long' to 'int', since the > >> variable used to store the return code 'rc' is an 'int'. This > >> truncated value is then returned back as a 'ssize_t' back from > >> perf_stats_show() to 'dev_attr_show()' which thinks of it as a large > >> unsigned number and triggers this warning.. > >> > >> To fix this we update the type of variable 'rc' from 'int' to > >> 'ssize_t' that prevents the compiler from truncating the return value > >> of drc_pmem_query_stats() and returning correct signed value back from > >> perf_stats_show(). > >> > >> Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance > >> stats from PHYP') > >> Signed-off-by: Vaibhav Jain <vaib...@linux.ibm.com> > >> --- > >> arch/powerpc/platforms/pseries/papr_scm.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c > >> b/arch/powerpc/platforms/pseries/papr_scm.c > >> index a88a707a608aa..9f00b61676ab9 100644 > >> --- a/arch/powerpc/platforms/pseries/papr_scm.c > >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c > >> @@ -785,7 +785,8 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor > >> *nd_desc, > >> static ssize_t perf_stats_show(struct device *dev, > >> struct device_attribute *attr, char *buf) > >> { > >> - int index, rc; > >> + int index; > >> + ssize_t rc; > > > > I'm not sure this is really fixing everything here. > > The issue is with the statement in perf_stats_show(): > > 'return rc ? rc : seq_buf_used(&s);' > > The function seq_buf_used() returns an 'unsigned int' and with 'rc' > typed as 'int', forces a promotion of the expression to 'unsigned int' > which causes a loss of signedness of 'rc' and compiler silently > assigns this unsigned value to the function return typed as 'signed > long'. > > Making 'rc', a 'signed long' forces a promotion of the expresion to > 'signed long' which preserves the signedness of 'rc' and will also be > compatible with the function return type.
Ok, ok, I read this all wrong. FWIW I would also cast seq_buf_used() to ssize_t to show you know what you are doing there. > > > > > drc_pmem_query_stats() can return negative errno's. Why are those not > > checked > > somewhere in perf_stats_show()? > > > For the specific invocation 'drc_pmem_query_stats(p, stats, 0)' we only > expect return value 'rc <=0' with '0' indicating a successful fetch of > nvdimm performance stats from hypervisor. Hence there are no explicit > checks for negative error codes in the functions as all return values > !=0 indicate an error. > > > > It seems like all this fix is handling is a > 0 return value: 'ret[0]' from > > line 289 in papr_scm.c... Or something? > No, in case the arg 'num_stats' is '0' and 'buff_stats != NULL' the > variable 'size' is assigned a non-zero value hence that specific branch > you mentioned is never taken. Instead in case of success > drc_pmem_query_stats() return '0' and in case of an error a negative > error code is returned. > > > > > Worse yet drc_pmem_query_stats() is returning ssize_t which is a signed > > value. > > Therefore, it should not be returning -errno. I'm surprised the static > > checkers did not catch that. > Didnt quite get the assertion here. The function is marked to return > 'ssize_t' because we can return both +ve for success and -ve values to > indicate errors. Sorry I was reading this as size_t and meant to say unsigned... I was looking at this too quickly. > > > > > I believe I caught similar errors with a patch series before which did not > > pay > > attention to variable types. > > > > Please audit this code for these types of errors and ensure you are really > > doing the correct thing when using the sysfs interface. I'm pretty sure bad > > things will eventually happen (if they are not already) if you return some > > really big number to the sysfs core from *_show(). > I think this problem is different compared to what you had previously pointed > to. The values returned from drc_pmem_query_stats() can be stored in an > 'int' variable too, however it was the silent promotion of a signed type > to unsigned type was what caused this specific issue. Ok this makes more sense now. Sorry about not looking more carefully. But I still think matching up the return of seq_buf_used() is worth it. I don't particularly like depending on 'automatic' promotions which make reviewing code harder like this. And sorry if my email seemed harsh I did not mean it to be. I just like when types are more explicit because I feel like it can avoid issues like this. (Specifically my confusion over the types...) :-D Thanks, Ira