sammccall added a comment. Well, thanks for such a well-thought-out patch :-) I have to admit my first reaction was terror, but this seems solid to me.
================ Comment at: clang-tools-extra/clangd/TUScheduler.cpp:1381 + auto &OS = llvm::errs(); + OS << "Signalled during AST action:\n"; + auto &Command = FileInputs.CompileCommand; ---------------- 0x1eaf wrote: > sammccall wrote: > > the name of the action is available as CurrentRequest->Action, which I > > think is safe to either pass into this function or read directly here. > I've noticed the the action name is always hardcoded as “Build AST > ({version})” and just logged the version directly below, but I guess it's not > forward compatible, so I'm going to log both… > the action name is always hardcoded as “Build AST ({version})” This shouldn't *always* be the case, there should be entries like "Hover" etc too, corresponding to `WorkScheduler->runWithAST("Hover", ...)` in ClangdServer.cpp. However the "build AST" actions are the ones that run the clang parser, so they're the slowest and crashiest for sure! ================ Comment at: clang-tools-extra/clangd/TUScheduler.cpp:1392 + OS << "Contents:\n"; + OS << FileInputs.Contents << "\n"; +} ---------------- 0x1eaf wrote: > sammccall wrote: > > sammccall wrote: > > > hmm, this is potentially really useful (could gather full reproducers) > > > but also is going to dominate the output and make it annoying to see the > > > stack traces/other details. I'm worried this may make this feature > > > net-negative for a majority of users, who won't find the right parts of > > > the output to report... > > > > > > How critical do you think this is for debugging crashes in practice? (I > > > suspect fairly important!) > > > Since we're taking liberties with async-signal-safety anyway, it might be > > > possible to get this sent to a tempfile, which might be more useful. In > > > any case, we might want full file dumps to be off by default outside of > > > environments where anyone is likely to actually try to get the > > > reproducers. > > > > > > I'd probably suggest leaving this out of the initial patch entirely so we > > > can focus on getting the rest of the mechanism right. > > another thought along the lines of reproducers (but nothing for this > > patch)... > > > > If we're working on a file and have a preamble, then the headers that make > > up the preamble are relevant. In practice it's hard to reproduce bug > > reports we get because we need all their headers. If we're using a preamble > > then just dumping all the filenames it includes would make it possible for > > some tool to take the crash report and zip up a reproducer. > > > > (This probably needs the ability to run all of the handlers for the current > > thread, not just the top of the stack, since we acquire the relevant > > preamble later) > > How critical do you think this is for debugging crashes in practice? > > For us it would likely be of utmost importance as our VSCode extension routes > requests to [Glean](https://glean.software) until local clangd finishes the > compilation, so clangd ends up mostly being used for modified files, and the > crashes might be unreproducible on the original source… > > > I'd probably suggest leaving this out of the initial patch entirely so we > > can focus on getting the rest of the mechanism right. > > Alright, will remove the file contents with the next update. > > > In any case, we might want full file dumps to be off by default outside of > > environments where anyone is likely to actually try to get the reproducers. > > What would be the best way to re-add it in a subsequent revision: make it > configurable via a command line option, or via an environment variable? And > if it's just an env var, maybe we could include it conditionally in this > revision? > > > If we're using a preamble then just dumping all the filenames it includes > > would make it possible for some tool to take the crash report and zip up a > > reproducer. > > We had discussions within the team about potentially implementing a similar > tool, but the rough idea was to rely on VCS to collect the changes as the > delta would normally be smaller than bundling all the headers that the TU > includes. But if the VCS-agnostic approach would be simpler and more portable > — I'm all for it :) > What would be the best way to re-add it in a subsequent revision: make it > configurable via a command line option, or via an environment variable? And > if it's just an env var, maybe we could include it conditionally in this > revision? Yeah an env var seems tempting to me. I can't put my finger on *why* it feels like an appropriate way to configure this vs a flag, but at least not having to plumb it around is nice. I'd be happy with `if (getenv("CLANGD_CRASH_DUMP_SOURCE"))` or something like that in this patch. > the rough idea was to rely on VCS to collect the changes Right, this is probably why we don't have such a thing yet - at work we can just dump the VCS info and be done. Being able to get self-contained bug reports from more diverse environments would be nice to have, but hasn't been urgent. ================ Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.cpp:15 + +static llvm::sys::ThreadLocal<ThreadSignalHandlerCallback> CurrentCallback; + ---------------- 0x1eaf wrote: > sammccall wrote: > > sammccall wrote: > > > sammccall wrote: > > > > we assume `thread_local` in clangd (see support/Context.ccp) > > > > > > > > It's possible that llvm::sys::ThreadLocal happens to play nicer with > > > > signal handlers in practice (none of these are completely safe!), but > > > > unless we've seen some concrete evidence I'd rather just have `static > > > > thread_local ThreadSignalHandlerCallback*` here > > > we can quite easily support a stack of handlers, such that we dump *all* > > > living handlers on the crashing thread. > > > > > > just give each ThreadSignalHandler a `ThreadSignalHandler* Next = > > > nullptr` member, and both the constructor and destructor `swap(Next, > > > CurrentCallback)`. Then CurrentCallback points to the head of a linked > > > list of handlers. > > > > > > Not sure if you want to have that in this patch, but it's pretty simple > > > so I wouldn't object. > > Sorry, just realized you already suggested the stacking in the patch > > description! > Having `Next` pointer chains makes me worried about out of order destruction, > but if we are removing the move constructor and making the crash handler > object scope-bound always, that could actually work. But if we want to keep > the possibility of storing the handler in an object to preserve it across > method calls, I guess we could just make this a doubly-linked list and cut > out the link gracefully. I think simplicity is probably more important than being totally flexible here, so I'd just forbid moving and out-of-order destruction. If we end up wanting to move these crash handlers across scopes I'd suggest we just wrap them in `unique_ptr` in those cases. (And make it the creator's job to ensure their lifetimes nest properly). ================ Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.cpp:22 + ThreadSignalHandlerCallback *Callback = CurrentCallback.get(); + // TODO: ignore non thread local signals like SIGTERM + if (Callback) { ---------------- 0x1eaf wrote: > sammccall wrote: > > I think we should do this already, probably just starting with an allowlist > > of `SIGBUS, SIGFPE, SIGILL, SIGSEGV` (these are signals that are delivered > > to the thread that triggered them per the linux docs). > > > > I'm also wondering how portable this assumption/mechanism is, e.g. whether > > we should enable it for certain platforms only. I guess it's probably true > > for unixy things, and apparently mostly true for windows too[1]? I'm a > > little concerned *something* may go wrong on windows, though. > > > > [1] > > https://stackoverflow.com/questions/15879342/which-thread-are-signal-handlers-e-g-signalsigint-crtl-c-called-on > I've just looked into adding filtering by signal number and there doesn't > seem to be a way to do this portably in LLVM. But it looks like it might be > safe enough to skip the filtering at all: > * From a cursory look at Windows `Signals.inc` it looks like the “signal” > handling is implemented via > [SetUnhandledExceptionFilter()](https://docs.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-setunhandledexceptionfilter) > which is thread-directed, and the `SetConsoleCtrlHandler()` callback [does > not > call](https://reviews.llvm.org/source/llvm-github/browse/main/llvm/lib/Support/Windows/Signals.inc$843-846) > the registered handlers. > * And on Unix platforms the `AddSignalHandler()` callbacks are called > [only](https://reviews.llvm.org/source/llvm-github/browse/main/llvm/include/llvm/Support/Signals.h$62-64) > for > [KillSigs](https://reviews.llvm.org/source/llvm-github/browse/main/llvm/lib/Support/Unix/Signals.inc$217) > out of which only `SIGQUIT` seem to be process-directed and would be > delivered to [any > thread](https://man7.org/linux/man-pages/man7/signal.7.html) that is not > blocking it, but if the thread gets delivered to has a `ThreadCrashReporter` > installed during the interrupt — it seems to make sense to run the handler > and let it print the thread-local context information (at least there seems > to be no harm in doing so). > > What do you think? This sounds fine to me. In a perfect world I'd rather SIGQUIT didn't dump the state of an essentially-random thread (we always have lots). But SIGQUIT isn't the common case, it's confusing but not otherwise harmful, and it seems like it'd be hard to fix. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D109506/new/ https://reviews.llvm.org/D109506 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits