+Mark Mentovai On Mon, Sep 11, 2017 at 12:24 PM, Leonard Mosescu <mose...@google.com> wrote:
> Thanks Jim, I'll give option #1 a try and upload a new patch. > > think your divisions into options is too simplistic. lldb is built of >> many different subsystems, and a lot of "inconsistent state" can be >> confined to one subsystem or other. So for instance, if one CU's debug is >> unrecoverably damaged, we can just pretend that that .o file was built w/o >> debug info and the user can do lots of useful debugging still. It is much >> better if we can say "Found irrecoverable errors in this CU, dumping debug >> info" and continue rather than crashing because one CU that the user may >> not even care about is all horked up. Similarly a single JIT execution can >> get itself all balled up and yet if we can discard the work it did we can >> continue doing lots of other things. > > > The key here is the recoverable part - if you can isolate the state and > recover (throw-away, rollback), then I agree, that can be a preferable > approach. My point is about truly irrecoverable states - WTF states where > you don't expect to get into, can't do anything meaningful in response and > will likely cause cascading issues. > > As a side note, I would *love* a generic roll-back mechanism in LLDB. Not > only that it can give us a viable alternative to failing fast, but it can > also allow things like LLDB-command interruption (which is one thing I'd > like to take a closer look when I get a chance) > > On Mon, Sep 11, 2017 at 10:55 AM, Jim Ingham <jing...@apple.com> wrote: > >> >> > On Sep 11, 2017, at 10:24 AM, Leonard Mosescu <mose...@google.com> >> wrote: >> > >> > I think everyone is in agreement that input should be sanitized, and >> ill-formed input should not trigger an abrupt termination if there's a way >> to recover (which is not always feasible). >> > >> > The conversation started around fail-fast for inconsistent and >> unrecoverable state, where I think the math is very clear in this case. >> Using one the the comments about the long-lasting debugging sessions as >> example, let's say that we have a bug and the debugger state gets corrupted >> after 1h of debugging: >> > 1. debugger crashes immediately: lost 1h of debugging -> angry bug >> report (or ideally automated bug report) >> > 2. debugger limps along for a few more hours behaving erratically (not >> necessarily in an obvious fashion) >> > a. developer gives up after many hours or debugging without >> understanding what's going on >> > b. developer figures out that the debugger is unreliable >> > >> > While it is choosing between two evils, I think #1 is clearly the right >> choice: #2 results in more lost time and frustration, less chance of >> getting a bug report and can result in a general impression that the >> debugger is not to be trusted (which for a developer tool is infinitely >> worse than an unstable tool) >> >> I think your divisions into options is too simplistic. lldb is built of >> many different subsystems, and a lot of "inconsistent state" can be >> confined to one subsystem or other. So for instance, if one CU's debug is >> unrecoverably damaged, we can just pretend that that .o file was built w/o >> debug info and the user can do lots of useful debugging still. It is much >> better if we can say "Found irrecoverable errors in this CU, dumping debug >> info" and continue rather than crashing because one CU that the user may >> not even care about is all horked up. Similarly a single JIT execution can >> get itself all balled up and yet if we can discard the work it did we can >> continue doing lots of other things. >> >> So I think there are lots of places where you can get into a bad state >> locally, but that doesn't mean #2 is the outcome. Sometimes you even get >> into a state where a specific target has done something totally >> incomprehensible. In that case, you may have to jettison that target, but >> that might not be the only target/debugger in this lldb session. So you >> still wouldn't want to quit, those targets are probably fine. >> >> As you note, this is a little off from the original discussion which was >> more about setting up and enforcing logic flows using the lldb_private >> API's. If you really have to call B only after calling A, and you can >> clearly signal to yourself that A was called, then it is okay put an assert >> there. But in your case, CanResume seems like an optimization - and you >> are clearly using it as such to avoid having to fix something deeper down >> in lldb. But in that case DoResume can error for other reasons, and so the >> system should deal with it. So adding another hard requirement was >> inelegant and exiting if it wasn't met just made that more obvious... >> >> > >> > -------------------------------------- >> > >> > Going back to the original fix, I think most observations are very >> pertinent - yes, the result is not ideal. This is because the current >> logic, changing the state in Process::Resume() before the whole operation >> completes successfully is ... unfortunate. So here are the options I see: >> > >> > 1. Fix Process::Resume() so we only change the state late, after >> everything succeds >> > 2. Rollback state change if anything fails while resuming >> > 3. Patch the specific case of "resume is not supported" >> > 4. Do nothing >> > >> > The proposed fix is #3, since it's the less intrusive (ie. doesn't >> change the core resume logic) patch for a very specific and narrow case. >> IMO the "right" fix is #1, but it seems a more radical change. #2 seems >> tempting in terms of code changes, but I find it less compelling overall >> since rollbacks are generally fragile if the code is not designed with that >> in mind. >> > >> > So, LLDB experts, what do you think? I'd be happy to go with #1 if >> someone with experience in this area can review the change. Anything else >> that I missed? >> > >> >> Seems like #1 is the right choice if you're willing to give it a try. >> You are right that this code is tricky, but it would be worth the effort to >> fix it right, if for no other reason than to get some other eyes on it. >> I'm happy to answer any questions as you go along. This code has >> ping-ponged back and forth between Greg and me, so we should both be able >> to help you out. >> >> Jim >> >> > Thanks! >> > >> > >> > >> > >> > >> > >> > On Sun, Sep 10, 2017 at 1:52 PM, Jim Ingham via lldb-commits < >> lldb-commits@lists.llvm.org> wrote: >> > >> >> On Sep 9, 2017, at 7:38 PM, Zachary Turner <ztur...@google.com> wrote: >> >> >> >> Sure, but reading a core file involves handling user input. Obviously >> that has to be sanitized and you can't crash on user input. I don't think >> there's ever been any disagreement about that. On the other hand, if you >> just type an expression in the debugger and expect an answer back, after >> clang says the ast is valid, the whole stack should be asserts all the way >> down, because everything is validated >> > >> > Yes, but even that ends up not matching with the facts on the ground, >> because the debug information for types as used by lldb has to be produced >> from debug info, and the debug info can be malformed and not infrequently >> is. So at some point clang is going to ask for something that it assumes >> must always be available because if it had read the types in from header >> files it would have asserted at an early stage but when it is fed types >> from debug info incrementally - which is the only efficient way the >> debugger can do it - is going to cause something to look wrong at a point >> where that “shouldn’t be possible.” clang's rigid structure means it can’t >> cope with this and will crash on something the user didn’t themselves get >> wrong. >> > >> > According to your classification, we should really consider debug >> information “user input” as well. But that means we’re wiring “user input” >> into a pretty deep place in clang and you’d probably have to weaken your >> strict assert model to handle that. And it makes me wonder if there’s >> anything the debugger consumes that wouldn’t fall into this category? >> > >> > For instance, we also let plugins provide threads and their stacks, and >> we try pretty hard not to crash and burn if these plugins get it a little >> wrong. So process state really needs to be treated as “user data” as >> well. And since we use SPI’s for most of this info that we are pretty much >> the only clients of, they can be expected to be flakey as well, again a >> fact that we should not inflict on users of the debugger if we can help it. >> > >> > The answer of trying to do everything dangerous out of process I think >> will make a slow and fragile architecture. And while I agree with the >> llvm.org decision not to use C++ exceptions because they make reasoning >> about the program more difficult, replacing that with SIGSEGV as the >> exception mechanism of choice seems really weak. You are not the first >> person to point out that we could use the crash-protected threads, but I’m >> remain very unenthusiastic about this approach... >> > >> > Moreover, there are a bunch of problematic areas in lldb, but I don’t >> think any of them are problematic for the reasons that you are describing >> here. There are areas of weak memory management in the data formatters & >> some places where we don’t manage locking right in handling process stops & >> the reconstruction of program state after that, and those cause the vast >> majority of the infrequent but annoying crashes we see. I don’t myself >> believe that putting in this assert everywhere architecture would pay off >> in reduced crashes to compensate for the brittleness it would introduce. >> > >> > Finally, and most crucially, no failure of any expression evaluation >> and no malformed type is worth taking down a debug session, and it’s our >> responsibility working on lldb to make sure it doesn't. If the makes our >> job a little harder, so be it. This was Jason’s point earlier, but it is >> worth reiterating because it makes lldb a very different kind of program >> from all the other programs in the suite collected under the llvm project. >> If the compiler gets into an inconsistent state compiling a source file, >> it’s fine for it to just exit. It wasn’t going to do any more useful work >> anyway. That is definitely NOT true of lldb, and makes reasonable the >> notion that general solutions that seem appropriate for the other tools are >> not appropriate for lldb. >> > >> > >> > Jim >> > >> > >> >> On Sat, Sep 9, 2017 at 6:53 PM Jim Ingham <jing...@apple.com> wrote: >> >> I think we are talking at cross purposes. Seems to me you are saying >> “If we can assert that the answers to questions I ask must always >> copacetic, then we can express that in software, and that will make things >> so much easier". I’m saying "my experience of the data debuggers have to >> deal with is such that assuming such assertions will lead you to over-react >> to such errors. Instead you should write your code so that if somebody >> gives you bad data you don’t fall over allowing the people who called you >> to decide how important the error was.” Every core file that was written >> out by OS X for years had a section that was ill-formed. Asserting when >> you get an ill-formed object file might seem a good way to ensure that you >> don’t have to make guesses that might lead you astray. But the people who >> need to load core files on OS X are rightly unmoved by such arguments if >> lldb disappears out from under them when reading in core files. >> >> >> >> Jim >> >> >> >> >> >> >> >>> On Sep 9, 2017, at 1:31 PM, Zachary Turner <ztur...@google.com> >> wrote: >> >>> >> >>> >> >>> >> >>> On Sat, Sep 9, 2017 at 12:04 PM Jim Ingham <jing...@apple.com> wrote: >> >>> >> >>> I disagree here. If you can reasonably unwind from an error you >> should do so even if you can’t figure out how you could have gotten an >> answer you didn’t expect. If an API is returning a pointer to something >> you should assume it might return nullptr unless the API explicitly states >> otherwise. >> >>> But that's exactly what an assert is. It's an explicit statement by >> the API about what should happen. Which is why by adding them liberally, >> these assumptions can then be propagated all the way up through many layers >> of the code, vastly simplifying the codebase. >> >>> >> >>> if you have >> >>> >> >>> void *foo(int x) { >> >>> // do some stuff >> >>> >> >>> assert(x < 0 || foo != nullptr); >> >>> } >> >>> >> >>> Then you're documenting that if x is greater than 0, the caller >> doesn't need to check the return value for nullptr. Now instead of this: >> >>> >> >>> void *bar(unsigned x) { >> >>> void *ptr = foo(x); >> >>> if (!ptr) { >> >>> // log an error >> >>> return nullptr; >> >>> } >> >>> return ptr; >> >>> } >> >>> >> >>> You just have >> >>> >> >>> void *bar(unsigned x) { >> >>> void *ptr = foo(x); >> >>> assert(x); >> >>> return x; >> >>> } >> >>> >> >>> And now the caller of bar doesn't have to check either. The code has >> greatly reduced complexity due to the butterfly efflect of propagating >> these assumptions up. >> >>> >> >>> This is a simple example but the point is that building assumptions >> into your API is a good thing, because you can enforce them and it vastly >> simplifies the code. >> >> >> > >> > >> > _______________________________________________ >> > lldb-commits mailing list >> > lldb-commits@lists.llvm.org >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits >> > >> > >> >> >
_______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits