Thanks Jim, I'll give option #1 a try and upload a new patch. think your divisions into options is too simplistic. lldb is built of > many different subsystems, and a lot of "inconsistent state" can be > confined to one subsystem or other. So for instance, if one CU's debug is > unrecoverably damaged, we can just pretend that that .o file was built w/o > debug info and the user can do lots of useful debugging still. It is much > better if we can say "Found irrecoverable errors in this CU, dumping debug > info" and continue rather than crashing because one CU that the user may > not even care about is all horked up. Similarly a single JIT execution can > get itself all balled up and yet if we can discard the work it did we can > continue doing lots of other things.
The key here is the recoverable part - if you can isolate the state and recover (throw-away, rollback), then I agree, that can be a preferable approach. My point is about truly irrecoverable states - WTF states where you don't expect to get into, can't do anything meaningful in response and will likely cause cascading issues. As a side note, I would *love* a generic roll-back mechanism in LLDB. Not only that it can give us a viable alternative to failing fast, but it can also allow things like LLDB-command interruption (which is one thing I'd like to take a closer look when I get a chance) On Mon, Sep 11, 2017 at 10:55 AM, Jim Ingham <jing...@apple.com> wrote: > > > On Sep 11, 2017, at 10:24 AM, Leonard Mosescu <mose...@google.com> > wrote: > > > > I think everyone is in agreement that input should be sanitized, and > ill-formed input should not trigger an abrupt termination if there's a way > to recover (which is not always feasible). > > > > The conversation started around fail-fast for inconsistent and > unrecoverable state, where I think the math is very clear in this case. > Using one the the comments about the long-lasting debugging sessions as > example, let's say that we have a bug and the debugger state gets corrupted > after 1h of debugging: > > 1. debugger crashes immediately: lost 1h of debugging -> angry bug > report (or ideally automated bug report) > > 2. debugger limps along for a few more hours behaving erratically (not > necessarily in an obvious fashion) > > a. developer gives up after many hours or debugging without > understanding what's going on > > b. developer figures out that the debugger is unreliable > > > > While it is choosing between two evils, I think #1 is clearly the right > choice: #2 results in more lost time and frustration, less chance of > getting a bug report and can result in a general impression that the > debugger is not to be trusted (which for a developer tool is infinitely > worse than an unstable tool) > > I think your divisions into options is too simplistic. lldb is built of > many different subsystems, and a lot of "inconsistent state" can be > confined to one subsystem or other. So for instance, if one CU's debug is > unrecoverably damaged, we can just pretend that that .o file was built w/o > debug info and the user can do lots of useful debugging still. It is much > better if we can say "Found irrecoverable errors in this CU, dumping debug > info" and continue rather than crashing because one CU that the user may > not even care about is all horked up. Similarly a single JIT execution can > get itself all balled up and yet if we can discard the work it did we can > continue doing lots of other things. > > So I think there are lots of places where you can get into a bad state > locally, but that doesn't mean #2 is the outcome. Sometimes you even get > into a state where a specific target has done something totally > incomprehensible. In that case, you may have to jettison that target, but > that might not be the only target/debugger in this lldb session. So you > still wouldn't want to quit, those targets are probably fine. > > As you note, this is a little off from the original discussion which was > more about setting up and enforcing logic flows using the lldb_private > API's. If you really have to call B only after calling A, and you can > clearly signal to yourself that A was called, then it is okay put an assert > there. But in your case, CanResume seems like an optimization - and you > are clearly using it as such to avoid having to fix something deeper down > in lldb. But in that case DoResume can error for other reasons, and so the > system should deal with it. So adding another hard requirement was > inelegant and exiting if it wasn't met just made that more obvious... > > > > > -------------------------------------- > > > > Going back to the original fix, I think most observations are very > pertinent - yes, the result is not ideal. This is because the current > logic, changing the state in Process::Resume() before the whole operation > completes successfully is ... unfortunate. So here are the options I see: > > > > 1. Fix Process::Resume() so we only change the state late, after > everything succeds > > 2. Rollback state change if anything fails while resuming > > 3. Patch the specific case of "resume is not supported" > > 4. Do nothing > > > > The proposed fix is #3, since it's the less intrusive (ie. doesn't > change the core resume logic) patch for a very specific and narrow case. > IMO the "right" fix is #1, but it seems a more radical change. #2 seems > tempting in terms of code changes, but I find it less compelling overall > since rollbacks are generally fragile if the code is not designed with that > in mind. > > > > So, LLDB experts, what do you think? I'd be happy to go with #1 if > someone with experience in this area can review the change. Anything else > that I missed? > > > > Seems like #1 is the right choice if you're willing to give it a try. You > are right that this code is tricky, but it would be worth the effort to fix > it right, if for no other reason than to get some other eyes on it. I'm > happy to answer any questions as you go along. This code has ping-ponged > back and forth between Greg and me, so we should both be able to help you > out. > > Jim > > > Thanks! > > > > > > > > > > > > > > On Sun, Sep 10, 2017 at 1:52 PM, Jim Ingham via lldb-commits < > lldb-commits@lists.llvm.org> wrote: > > > >> On Sep 9, 2017, at 7:38 PM, Zachary Turner <ztur...@google.com> wrote: > >> > >> Sure, but reading a core file involves handling user input. Obviously > that has to be sanitized and you can't crash on user input. I don't think > there's ever been any disagreement about that. On the other hand, if you > just type an expression in the debugger and expect an answer back, after > clang says the ast is valid, the whole stack should be asserts all the way > down, because everything is validated > > > > Yes, but even that ends up not matching with the facts on the ground, > because the debug information for types as used by lldb has to be produced > from debug info, and the debug info can be malformed and not infrequently > is. So at some point clang is going to ask for something that it assumes > must always be available because if it had read the types in from header > files it would have asserted at an early stage but when it is fed types > from debug info incrementally - which is the only efficient way the > debugger can do it - is going to cause something to look wrong at a point > where that “shouldn’t be possible.” clang's rigid structure means it can’t > cope with this and will crash on something the user didn’t themselves get > wrong. > > > > According to your classification, we should really consider debug > information “user input” as well. But that means we’re wiring “user input” > into a pretty deep place in clang and you’d probably have to weaken your > strict assert model to handle that. And it makes me wonder if there’s > anything the debugger consumes that wouldn’t fall into this category? > > > > For instance, we also let plugins provide threads and their stacks, and > we try pretty hard not to crash and burn if these plugins get it a little > wrong. So process state really needs to be treated as “user data” as > well. And since we use SPI’s for most of this info that we are pretty much > the only clients of, they can be expected to be flakey as well, again a > fact that we should not inflict on users of the debugger if we can help it. > > > > The answer of trying to do everything dangerous out of process I think > will make a slow and fragile architecture. And while I agree with the > llvm.org decision not to use C++ exceptions because they make reasoning > about the program more difficult, replacing that with SIGSEGV as the > exception mechanism of choice seems really weak. You are not the first > person to point out that we could use the crash-protected threads, but I’m > remain very unenthusiastic about this approach... > > > > Moreover, there are a bunch of problematic areas in lldb, but I don’t > think any of them are problematic for the reasons that you are describing > here. There are areas of weak memory management in the data formatters & > some places where we don’t manage locking right in handling process stops & > the reconstruction of program state after that, and those cause the vast > majority of the infrequent but annoying crashes we see. I don’t myself > believe that putting in this assert everywhere architecture would pay off > in reduced crashes to compensate for the brittleness it would introduce. > > > > Finally, and most crucially, no failure of any expression evaluation and > no malformed type is worth taking down a debug session, and it’s our > responsibility working on lldb to make sure it doesn't. If the makes our > job a little harder, so be it. This was Jason’s point earlier, but it is > worth reiterating because it makes lldb a very different kind of program > from all the other programs in the suite collected under the llvm project. > If the compiler gets into an inconsistent state compiling a source file, > it’s fine for it to just exit. It wasn’t going to do any more useful work > anyway. That is definitely NOT true of lldb, and makes reasonable the > notion that general solutions that seem appropriate for the other tools are > not appropriate for lldb. > > > > > > Jim > > > > > >> On Sat, Sep 9, 2017 at 6:53 PM Jim Ingham <jing...@apple.com> wrote: > >> I think we are talking at cross purposes. Seems to me you are saying > “If we can assert that the answers to questions I ask must always > copacetic, then we can express that in software, and that will make things > so much easier". I’m saying "my experience of the data debuggers have to > deal with is such that assuming such assertions will lead you to over-react > to such errors. Instead you should write your code so that if somebody > gives you bad data you don’t fall over allowing the people who called you > to decide how important the error was.” Every core file that was written > out by OS X for years had a section that was ill-formed. Asserting when > you get an ill-formed object file might seem a good way to ensure that you > don’t have to make guesses that might lead you astray. But the people who > need to load core files on OS X are rightly unmoved by such arguments if > lldb disappears out from under them when reading in core files. > >> > >> Jim > >> > >> > >> > >>> On Sep 9, 2017, at 1:31 PM, Zachary Turner <ztur...@google.com> wrote: > >>> > >>> > >>> > >>> On Sat, Sep 9, 2017 at 12:04 PM Jim Ingham <jing...@apple.com> wrote: > >>> > >>> I disagree here. If you can reasonably unwind from an error you > should do so even if you can’t figure out how you could have gotten an > answer you didn’t expect. If an API is returning a pointer to something > you should assume it might return nullptr unless the API explicitly states > otherwise. > >>> But that's exactly what an assert is. It's an explicit statement by > the API about what should happen. Which is why by adding them liberally, > these assumptions can then be propagated all the way up through many layers > of the code, vastly simplifying the codebase. > >>> > >>> if you have > >>> > >>> void *foo(int x) { > >>> // do some stuff > >>> > >>> assert(x < 0 || foo != nullptr); > >>> } > >>> > >>> Then you're documenting that if x is greater than 0, the caller > doesn't need to check the return value for nullptr. Now instead of this: > >>> > >>> void *bar(unsigned x) { > >>> void *ptr = foo(x); > >>> if (!ptr) { > >>> // log an error > >>> return nullptr; > >>> } > >>> return ptr; > >>> } > >>> > >>> You just have > >>> > >>> void *bar(unsigned x) { > >>> void *ptr = foo(x); > >>> assert(x); > >>> return x; > >>> } > >>> > >>> And now the caller of bar doesn't have to check either. The code has > greatly reduced complexity due to the butterfly efflect of propagating > these assumptions up. > >>> > >>> This is a simple example but the point is that building assumptions > into your API is a good thing, because you can enforce them and it vastly > simplifies the code. > >> > > > > > > _______________________________________________ > > lldb-commits mailing list > > lldb-commits@lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits > > > > > >
_______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits