(For those following along at home, a fix was merged yesterday.) On Mar 12, 2017 01:04, "Rob Landley" <r...@landley.net> wrote:
> On 03/11/2017 01:22 PM, enh wrote: > > from the line number addr2line gave above, it's the line marked with ! > > below. i've included the block between too, since that modifies > > new->child. > > > > // Recurse down into tasks, retaining thread groups. > > // Disable show_process at least until we can calculate tcount > > kcount = TT.kcount; > > sprintf(toybuf, "/proc/%u/task", pid); > > new->child = dirtree_flagread(toybuf, DIRTREE_SHUTUP|DIRTREE_PROC, > get_ps); > > TT.threadparent = 0; > > kcount = TT.kcount-kcount+1; > > tb = (void *)new->extra; > > tb->slot[SLOT_tcount] = kcount; > > > > // Fill out tid and thread count for each entry in group > > ! if (new->child) for (dt = new->child->child; dt; dt = dt->next) { > > > > so i assume it's that dirtree_flagread that's failing, returning > > DIRTREE_ABORTVAL, and since the code that follows just assumes it's > > either valid or null, we die on the line marked ! (as claimed by the > > stack trace). > > > > i don't know is why why dirtree_flagread is failing, though, and can't > > reproduce it. > > Oh that's easy to see from context: /proc/%u exited before /proc/$u/task > could be opened, so dirtree_flagread() went "nope" and we didn't check > for it. > > Asynchronous process exit while we're reading its data is something I > tried to be robust against in the rest of ps, usually by effectively > moving the process exit a milisecond or so earlier. If we couldn't read > the data because process go bye-bye how does that differ from process go > bye-bye an instant before we would have noticed it? > > Looks like I messed up here going back and adding thread support because > the dirtree_flagread() semantics are non-obvious. (The > dirtree_flagread() reopen in thread scanning is a rough edge in the > design. I'm splicing together discontiguous trees manually and the > result doesn't have the checks and review the existing dirtree.c stuff > does.) > > And that's my real concern here: the dirtree_flagread() semantics > _shouldn't_ be non-obvious. Should I change dirtree_flagread() to always > return NULL or valid? Why did I have it NOT do that in the first place > (to the point I commented it not doing that). I vaguely recall there's > some error condition that the caller can only distinguish if it gets the > extra error return to distinguish the case, but I don't remember _what_ > the condition is. (Exists but permissions wrong to actually read its > contents, maybe?) > > Alas my toybox work directory's in a bit of a shambles at the moment > because I've got half-finished make.sh changes for getconf.c and > half-finished help text parsing changes, and between the two of 'em it > doesn't _build_ at the moment. (My two weeks in tokyo and the week in > portland before that I was too busy with other stuff to close any tabs. > Happy to be home, but torn between resting and shoveling out...) > > I can do a quick fix here (just add a test for abortval on that line) > but I'd rather change flagread not to return abortval, which means > understanding what I lose by doing that and being comfortable with it. > (And why the _other_ calls to flagread aren't dying more regularly with > abortval returns? How is this NOT screwing up ps with no /proc ?) > > But once again, I'm only getting to this at 4am. (Ok, mostly that's > jetlag royally horking my schedule.) Might not get to this tomorrow if > getconf build infrastructure continues to be stroppy, but it's #3 on my > toybox todo list right now. > > Thanks, > > Rob >
_______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net