sammccall added a comment.

Well, thanks for such a well-thought-out patch :-) I have to admit my first 
reaction was terror, but this seems solid to me.



================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:1381
+  auto &OS = llvm::errs();
+  OS << "Signalled during AST action:\n";
+  auto &Command = FileInputs.CompileCommand;
----------------
0x1eaf wrote:
> sammccall wrote:
> > the name of the action is available as CurrentRequest->Action, which I 
> > think is safe to either pass into this function or read directly here.
> I've noticed the the action name is always hardcoded as “Build AST 
> ({version})” and just logged the version directly below, but I guess it's not 
> forward compatible, so I'm going to log both…
> the action name is always hardcoded as “Build AST ({version})”

This shouldn't *always* be the case, there should be entries like "Hover" etc 
too, corresponding to `WorkScheduler->runWithAST("Hover", ...)` in 
ClangdServer.cpp.

However the "build AST" actions are the ones that run the clang parser, so 
they're the slowest and crashiest for sure!


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:1392
+  OS << "Contents:\n";
+  OS << FileInputs.Contents << "\n";
+}
----------------
0x1eaf wrote:
> sammccall wrote:
> > sammccall wrote:
> > > hmm, this is potentially really useful (could gather full reproducers) 
> > > but also is going to dominate the output and make it annoying to see the 
> > > stack traces/other details. I'm worried this may make this feature 
> > > net-negative for a majority of users, who won't find the right parts of 
> > > the output to report...
> > > 
> > > How critical do you think this is for debugging crashes in practice? (I 
> > > suspect fairly important!)
> > > Since we're taking liberties with async-signal-safety anyway, it might be 
> > > possible to get this sent to a tempfile, which might be more useful. In 
> > > any case, we might want full file dumps to be off by default outside of 
> > > environments where anyone is likely to actually try to get the 
> > > reproducers.
> > > 
> > > I'd probably suggest leaving this out of the initial patch entirely so we 
> > > can focus on getting the rest of the mechanism right.
> > another thought along the lines of reproducers (but nothing for this 
> > patch)...
> > 
> > If we're working on a file and have a preamble, then the headers that make 
> > up the preamble are relevant. In practice it's hard to reproduce bug 
> > reports we get because we need all their headers. If we're using a preamble 
> > then just dumping all the filenames it includes would make it possible for 
> > some tool to take the crash report and zip up a reproducer.
> > 
> > (This probably needs the ability to run all of the handlers for the current 
> > thread, not just the top of the stack, since we acquire the relevant 
> > preamble later)
> > How critical do you think this is for debugging crashes in practice?
> 
> For us it would likely be of utmost importance as our VSCode extension routes 
> requests to [Glean](https://glean.software) until local clangd finishes the 
> compilation, so clangd ends up mostly being used for modified files, and the 
> crashes might be unreproducible on the original source…
> 
> > I'd probably suggest leaving this out of the initial patch entirely so we 
> > can focus on getting the rest of the mechanism right.
> 
> Alright, will remove the file contents with the next update.
> 
> > In any case, we might want full file dumps to be off by default outside of 
> > environments where anyone is likely to actually try to get the reproducers.
> 
> What would be the best way to re-add it in a subsequent revision: make it 
> configurable via a command line option, or via an environment variable? And 
> if it's just an env var, maybe we could include it conditionally in this 
> revision?
> 
> > If we're using a preamble then just dumping all the filenames it includes 
> > would make it possible for some tool to take the crash report and zip up a 
> > reproducer.
> 
> We had discussions within the team about potentially implementing a similar 
> tool, but the rough idea was to rely on VCS to collect the changes as the 
> delta would normally be smaller than bundling all the headers that the TU 
> includes. But if the VCS-agnostic approach would be simpler and more portable 
> — I'm all for it :)
> What would be the best way to re-add it in a subsequent revision: make it 
> configurable via a command line option, or via an environment variable? And 
> if it's just an env var, maybe we could include it conditionally in this 
> revision?

Yeah an env var seems tempting to me. I can't put my finger on *why* it feels 
like an appropriate way to configure this vs a flag, but at least not having to 
plumb it around is nice.
I'd be happy with `if (getenv("CLANGD_CRASH_DUMP_SOURCE"))` or something like 
that in this patch.

> the rough idea was to rely on VCS to collect the changes

Right, this is probably why we don't have such a thing yet - at work we can 
just dump the VCS info and be done. Being able to get self-contained bug 
reports from more diverse environments would be nice to have, but hasn't been 
urgent.


================
Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.cpp:15
+
+static llvm::sys::ThreadLocal<ThreadSignalHandlerCallback> CurrentCallback;
+
----------------
0x1eaf wrote:
> sammccall wrote:
> > sammccall wrote:
> > > sammccall wrote:
> > > > we assume `thread_local` in clangd (see support/Context.ccp)
> > > > 
> > > > It's possible that llvm::sys::ThreadLocal happens to play nicer with 
> > > > signal handlers in practice (none of these are completely safe!), but 
> > > > unless we've seen some concrete evidence I'd rather just have `static 
> > > > thread_local ThreadSignalHandlerCallback*` here
> > > we can quite easily support a stack of handlers, such that we dump *all* 
> > > living handlers on the crashing thread.
> > > 
> > > just give each ThreadSignalHandler a `ThreadSignalHandler* Next = 
> > > nullptr` member, and both the constructor and destructor `swap(Next, 
> > > CurrentCallback)`. Then CurrentCallback points to the head of a linked 
> > > list of handlers.
> > > 
> > > Not sure if you want to have that in this patch, but it's pretty simple 
> > > so I wouldn't object.
> > Sorry, just realized you already suggested the stacking in the patch 
> > description!
> Having `Next` pointer chains makes me worried about out of order destruction, 
> but if we are removing the move constructor and making the crash handler 
> object scope-bound always, that could actually work. But if we want to keep 
> the possibility of storing the handler in an object to preserve it across 
> method calls, I guess we could just make this a doubly-linked list and cut 
> out the link gracefully.
I think simplicity is probably more important than being totally flexible here, 
so I'd just forbid moving and out-of-order destruction.
If we end up wanting to move these crash handlers across scopes I'd suggest we 
just wrap them in `unique_ptr` in those cases. (And make it the creator's job 
to ensure their lifetimes nest properly).


================
Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.cpp:22
+      ThreadSignalHandlerCallback *Callback = CurrentCallback.get();
+      // TODO: ignore non thread local signals like SIGTERM
+      if (Callback) {
----------------
0x1eaf wrote:
> sammccall wrote:
> > I think we should do this already, probably just starting with an allowlist 
> > of `SIGBUS, SIGFPE, SIGILL, SIGSEGV` (these are signals that are delivered 
> > to the thread that triggered them per the linux docs).
> > 
> > I'm also wondering how portable this assumption/mechanism is, e.g. whether 
> > we should enable it for certain platforms only. I guess it's probably true 
> > for unixy things, and apparently mostly true for windows too[1]? I'm a 
> > little concerned *something* may go wrong on windows, though.
> > 
> > [1] 
> > https://stackoverflow.com/questions/15879342/which-thread-are-signal-handlers-e-g-signalsigint-crtl-c-called-on
> I've just looked into adding filtering by signal number and there doesn't 
> seem to be a way to do this portably in LLVM. But it looks like it might be 
> safe enough to skip the filtering at all:
> * From a cursory look at Windows `Signals.inc` it looks like the “signal” 
> handling is implemented via 
> [SetUnhandledExceptionFilter()](https://docs.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-setunhandledexceptionfilter)
>  which is thread-directed, and the `SetConsoleCtrlHandler()` callback [does 
> not 
> call](https://reviews.llvm.org/source/llvm-github/browse/main/llvm/lib/Support/Windows/Signals.inc$843-846)
>  the registered handlers.
> * And on Unix platforms the `AddSignalHandler()` callbacks are called 
> [only](https://reviews.llvm.org/source/llvm-github/browse/main/llvm/include/llvm/Support/Signals.h$62-64)
>  for 
> [KillSigs](https://reviews.llvm.org/source/llvm-github/browse/main/llvm/lib/Support/Unix/Signals.inc$217)
>  out of which only `SIGQUIT` seem to be process-directed and would be 
> delivered to [any 
> thread](https://man7.org/linux/man-pages/man7/signal.7.html) that is not 
> blocking it, but if the thread gets delivered to has a `ThreadCrashReporter` 
> installed during the interrupt — it seems to make sense to run the handler 
> and let it print the thread-local context information (at least there seems 
> to be no harm in doing so).
> 
> What do you think?
This sounds fine to me.

In a perfect world I'd rather SIGQUIT didn't dump the state of an 
essentially-random thread (we always have lots). But SIGQUIT isn't the common 
case, it's confusing but not otherwise harmful, and it seems like it'd be hard 
to fix.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109506/new/

https://reviews.llvm.org/D109506

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to