Hi Stephan, On Fri, 2015-09-11 at 16:04 +0200, Stephan Bergmann wrote: > But I doubt we want to make our code base more capricious than > necessary, to shield us from behavior exhibited by the Windows debugging > environment.
Ah ;-) well - I saw std::abort not aborting, and I added that to make it actually - die ;-> you recall the discussion: _exit() was the solution. I rather suspect that while the process is being debugged, and the dialog is up that other threads are making progress - anyhow - the windows behavior is somewhat unusual here. > > Which thread would you expect the signal to be delivered >> to (I wonder) - it's all a bit interesting I suspect. > > The case should be pretty clear for a synchronous, std::abort-generated > SIGABRT (hopefully even on Windows). I don't find much that's terribly clear about signal handling, and/or the cross-thread synchronization mess that follows it around under the covers =) > > My hope was that the watchdog would carry on working in these cases & > > kill us again more aggressively if necessary if people insist on > > ignoring these guys. > > But how should it do that? Even if the SIGABRT-handling were done on > another thread, the watchdog thread just couldn't progress past the > std::abort() (notwithstanding cheating in a debugging environment). Good point =) so best to start a new watchdog instance in the abort handler then. > So there's only a single instance of the watchdog thread supposed to > ever run. The odd "static bool bFired" in OpenGLWatchdogThrad::execute > had fooled me to assume otherwise (for why else should the variable have > static storage duration). Ah - this was a reasonably harmless way to avoid using a variable in a wider scope ;-) given that this class is a singleton. > Anyway, generalizing that "watchdog the OpenGLWatchdogThread, in case > our signal handler gets stuck" idea obviously leads to a "watchdog our > signal handler, in case it gets stuck" feature, i.e., spawn a thread > early in our signal handler (assuming spawning an additional thread > doesn't make our violation of what a signal handler is supposed to be > allowed to do any worse), which will call _exit after a fixed amount of > time. Actually, I think that's a great idea =) I've ~often seen traces out of bugzilla for hung processes (on Linux at least) where the hang was in a crash from the recovery process. That leads to these unfortunate dead windows lingering around etc. and upset users. > The question just is, what is a reasonable value for that amount > of time. Make it too short, and you'll prevent recovery of documents > that take long to save and for which our document recovery would > otherwise have happened to work fine. Right; hmm =) several of the traces I remember seeing were nasty ones where eg. the malloc arena mutex was locked - making it rather hard to make progress ;-) or we were blocked trying to get the solar-mutex. I guess if we were truly 31337 we would hook some interaction handler that had a global progress-bar hook (so we would see the emergency 'save' making progress), and another that would ignore yielding waiting for user-interaction (or do we not ask questions during the crash handler - I forget - there is plenty of GUI stuff there still). It might work: I'd say if there is no progress-bar type update from a file filter in 5 seconds of any kind, it is "really game-over" =) > And the true route ahead of course is to no longer put our document > recovery strategy at the mercy of a brittle, undefined-behavior--riddled > signal handler. Sure =) far more ideal would be to stream the keystroke / edits that happen on the document and fsync them to an append-only file ever few keystrokes, and then re-play them on crash-recovery =) so "nothing can ever be lost" - would be ideal. Only problem is - we need to implement something like a collaborative editor first I think =) ATB, Michael. -- michael.me...@collabora.com <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice