On Sun, Dec 15, 2024 at 02:58:26AM +0100, Gianfranco Trad wrote: > Hi Kent, > > I wanted to follow up on this patch. Over the last few days, I've > investigated it further and observed the following that might be of help: > > 1- zero-initing whole b struct (as the first patch version) leads to a clean > log [1]. > While if trying to memset to 0 only b.k field log reports [2]: > > bcachefs (da441363-bb6a-4ab9-999b-c1f40db4fee2): filesystem UUID already > open > bcachefs (da441363-bb6a-4ab9-999b-c1f40db4fee2): shutdown complete > bcachefs: bch2_fs_get_tree() error: EINVAL > > > 2- Given both versions of the patch didn't trigger the uninit issue anymore > I checked whether inner fields of b.k.bucket are correctly inited, just > before the bug triggers. > b.k.bucket fields seeming to look correctly inited: snapshot = 0, offset = > 34, inode = 0, gen = 0. > > 3- The first of the 2 reproducers causes a segfault: > > ==9335== Invalid[ 264.802101][ T9346] read of size 1 > ==9335== at 0x483BC39: strnlen (vg_replace_strmem.c:426) > ==9335== by 0x1098F0: netlink_query_family_id (repro.c:176) > ==9335== by 0x109A51: syz_genetlink_get_family_id (repro.c:211) > ==9335== by 0x10B476: execute_one (repro.c:2071) > ==9335== by 0x10B1A5: loop (repro.c:745) > ==9335== by 0x10B52F: main (repro.c:2088) > ==9335== Address 0x0 is not stack'd, malloc'd or (recently) free'd > > As of now, it seems unrelated to the root cause of the reported syzbot bug. > > > At this point, zero-initializing the entire struct seems to work reliably, > thought I'm still trying to get the full picture on this bug.
I think this might be a padding issue. C struct literals are supposed to initialize the whole struct, but the compiler folks in their infinite wisdom decided that didn't apply to padding. I think this needs to be raised with the KMSAN and compiler folks, as inserting memsets all over the place would be a sad workaround.