Re: [lldb-dev] LLDB Evolution - Final Form

Mehdi Amini via lldb-dev Mon, 19 Sep 2016 14:21:07 -0700

> On Sep 19, 2016, at 2:02 PM, Greg Clayton via lldb-dev 
> <[email protected]> wrote:
> 
> 
>> On Sep 19, 2016, at 1:18 PM, Zachary Turner via lldb-dev 
>> <[email protected]> wrote:
>> 
>> Following up with Kate's post from a few weeks ago, I think the dust has 
>> settled on the code reformat and it went over pretty smoothly for the most 
>> part.  So I thought it might be worth throwing out some ideas for where we 
>> go from here.  I have a large list of ideas (more ideas than time, sadly) 
>> that I've been collecting over the past few weeks, so I figured I would 
>> throw them out in the open for discussion.
>> 
>> I’ve grouped the areas for improvement into 3 high level categories.
>> 
>>      • De-inventing the wheel - We should use more code from LLVM, and 
>> delete code in LLDB where LLVM provides a solution.  In cases where there is 
>> an LLVM thing that is *similar* to what we need, we should extend the LLVM 
>> thing to support what we need, and then use it.  Following are some areas 
>> I've identified.  This list is by no means complete.  For each one, I've 
>> given a personal assessment of how likely it is to cause some (temporary) 
>> hiccups, how much it would help us in the long run, and how difficult it 
>> would be to do.  Without further ado:
>>              • Use llvm::Regex instead of lldb::Regex
>>                      • llvm::Regex doesn’t support enhanced mode.  Could we 
>> add support for this to llvm::Regex?
>>                      • Risk: 6
>>                      • Impact: 3
>>                      • Difficulty / Effort: 3  (5 if we have to add enhanced 
>> mode support)
> 
> As long as it supports all of the features, then this is fine. Otherwise we 
> will need to modify the regular expressions to work with the LLVM version or 
> back port any advances regex features we need into the LLVM regex parser. 
> 
>>              • Use llvm streams instead of lldb::StreamString
>>                      • Supports output re-targeting (stderr, stdout, 
>> std::string, etc), printf style formatting, and type-safe streaming 
>> operators.
>>                      • Interoperates nicely with many existing llvm utility 
>> classes
>>                      • Risk: 4
>>                      • Impact: 5
>>                      • Difficulty / Effort: 7
> 
> I have worked extensively with both C++ streams and with printf. I don't find 
> the type safe streaming operators to be readable and a great way to place 
> things into streams. Part of my objection to this will be quelled if the LLVM 
> streams are not stateful like C++ streams are (add an extra "std::hex" into 
> the stream and suddenly other things down stream start coming out in hex). As 
> long as printf functionality is in the LLVM streams I am ok with it.
> 
>>              • Use llvm::Error instead of lldb::Error
>>                      • llvm::Error is an error class that *requires* you to 
>> check whether it succeeded or it will assert.  In a way, it's similar to a 
>> C++ exception, except that it doesn't come with the performance hit 
>> associated with exceptions.  It's extensible, and can be easily extended to 
>> support the various ways LLDB needs to construct errors and error messages.
>>                      • Would need to first rename lldb::Error to LLDBError 
>> so that te conversion from LLDBError to llvm::Error could be done 
>> incrementally.
>>                      • Risk: 7
>>                      • Impact: 7
>>                      • Difficulty / Effort: 8
> 
> We must be very careful here. Nothing can crash LLDB and adding something 
> that will be known to crash in some cases can only be changed if there is a 
> test that can guarantee that it won't crash. assert() calls in a shared 
> library like LLDB are not OK and shouldn't fire. Even if they do, when 
> asserts are off, then it shouldn't crash LLDB. We have made efforts to only 
> use asserts in LLDB for things that really can't happen, but for things like 
> constructing a StringRef with NULL is one example of why I don't want to use 
> certain classes in LLVM. If we can remove the crash threat, then lets use 
> them. But many people firmly believe that asserts belong in llvm and clang 
> code and I don't agree.


I’m surprise by your aversion to assertions, what is your suggested 
alternative? Are you expecting to check and handle every possible invariants 
everywhere and recover (or signal an error) properly? That does not seem 
practical, and it seems to me that it'd lead to just involving UB (or breaking 
design invariant) without actually noticing it.

> Also many asserts are in header files so even if you build llvm and clang 
> with asserts off, but then build our project that uses LLVM with asserts on, 
> you get LLVM and clang asserts when you don't want them.

It is not really supported to use the LLVM C++ headers without assertions and 
link to a LLVM build with assertions (and vice-versa).


> 
>>              • StringRef instead of const char *, len everywhere
>>                      • Can do most common string operations in a way that is 
>> guaranteed to be safe.
>>                      • Reduces string manipulation algorithm complexity by 
>> an order of magnitude.
>>                      • Can potentially eliminate tens of thousands of string 
>> copies across the codebase.
>>                      • Simplifies code.
>>                      • Risk: 3
>>                      • Impact: 8
>>                      • Difficulty / Effort: 7
> 
> I don't think StringRef needs to be used everywhere, but it many places it 
> does make sense. For example our public API should not contain any LLVM 
> classes in the API. "const char *" is a fine parameter for public functions. 
> I really hate the fact that constructing llvm::StringRef with NULL causes an 
> assertion. Many people don't know that and don't take that into account when 
> making their changes. I would love to see the assertion taken out to tell you 
> the truth. That would make me feel better about using StringRef in many more 
> places within LLDB as we shouldn't ever crash due to this. I would expect 
> most internal APIs are safe to move to llvm::StringRef, but we will need to 
> make sure none of these require a null terminated string which would cause us 
> to have to call "str.str().c_str()". So internally, yes we can cut over to 
> more use of StringRef. But the public API can't be changed.
> 
> 
>>              • ArrayRef instead of const void *, len everywhere
>>                      • Same analysis as StringRef
> 
> Internally yes, external APIs no.
>>              • MutableArrayRef instead of void *, len everywhere
>>                      • Same analysis as StringRef
> Internally yes, external APIs no.
>>              • Delete ConstString, use a modified StringPool that is 
>> thread-safe.
>>                      • StringPool is a non thread-safe version of 
>> ConstString.
>>                      • Strings are internally refcounted so they can be 
>> cleaned up when they are no longer used.  ConstStrings are a large source of 
>> memory in LLDB, so ref-counting and removing stale strings has the potential 
>> to be a huge savings.
>>                      • Risk: 2
>>                      • Impact: 9
>>                      • Difficulty / Effort: 6
> 
> We can't currently rely on ref counting. We currently hand out "const char *" 
> as return values from many public API calls and these rely on the strings 
> living forever. We many many existing clients and we can't change the public 
> API. So I vote to avoid this. We are already using LLVM string pools under 
> the covers and we also associate the mangled/demangled names in the map as is 
> saves us a ton of cycles when parsing stuff as we never demangle (very 
> expensive) the same name twice. So I don't see the need to change this as it 
> is already backed by LLVM technology under the covers.
> 
>>              • thread_local instead of lldb::ThreadLocal
>>                      • This fixes a number of bugs on Windows that cannot be 
>> fixed otherwise, as they require compiler support.
>>                      • Some other compilers may not support this yet?
>>                      • Risk: 2
>>                      • Impact: 3
>>                      • Difficulty: 3
> 
> As long as all compilers support this then this is fine.
> 
>>              • Use llvm::cl for the command line arguments to the primary 
>> lldb executable.
>>                      • Risk: 2
>>                      • Impact: 3
>>                      • Difficulty / Effort: 4
> 
> Easy and fine to switch over to. We might need to port some getopt_long 
> functionality into it if it doesn't handle all of the things that getopt_long 
> does. For example arguments and options can be interspersed in getopt_long(). 
> You can also terminate your arguments with "--". It accepts single dashes for 
> long option names if they don't conflict with short options. Many things like 
> this.
> 
>>      • Testing - Our testing infrastructure is unstable, and our test 
>> coverage is lacking.  We should take steps to improve this.
>>              • Port as much as possible to lit
>>                      • Simple tests should be trivial to port to lit today.  
>> If nothing else this serves as a proof of concept while increasing the speed 
>> and stability of the test suite, since lit is a more stable harness.
> 
> As long a tests use the public API to run test. We are not doing text 
> scraping. 
> 
>>              • Separate testing tools
>>                      • One question that remains open is how to represent 
>> the complicated needs of a debugger in lit tests.  Part a) above covers the 
>> trivial cases, but what about the difficult cases?  In 
>> https://reviews.llvm.org/D24591 a number of ideas were discussed.  We 
>> started getting to this idea towards the end, about a separate tool which 
>> has an interface independent of the command line interface and which can be 
>> used to test.  lldb-mi was mentioned.  While I have serious concerns about 
>> lldb-mi due to its poorly written and tested codebase, I do agree in 
>> principle with the methodology.  In fact, this is the entire philosophy 
>> behind lit as used with LLVM, clang, lld, etc.  
> 
> We have a public API... Why are we going to go and spend _any_ time on doing 
> anything else? I just don't get it. What a waste of time. We have a public 
> API. Lets use it. Not simple enough for people? Tough. Testing a debugger 
> should be done using the public API except when we are actually trying to 
> test the LLDB command line in pexpect like tests. Those and only those are 
> fine to covert over to using LIT and use text scraping. But 95% of our 
> testing should be done using the API that our IDEs use. Using MI? Please no.
>> 
>> I don’t take full credit for this idea.  I had been toying with a similar 
>> idea for some time, but it was further cemented in an offline discussion 
>> with a co-worker.  
>> 
>> There many small, targeted tools in LLVM (e.g. llc, lli, llvm-objdump, etc) 
>> whose purpose are to be chained together to do interesting things.  Instead 
>> of a command line api as we think of in LLDB where you type commands from an 
>> interactive prompt, they have a command line api as you would expect from 
>> any tool which is launched from a shell.
>> 
>> I can imagine many potential candidates for lldb tools of this nature.  Off 
>> the top of my head:
>>      • lldb-unwind - A tool for testing the unwinder.  Accepts byte code as 
>> input and passes it through to the unwinder, outputting a compressed summary 
>> of the steps taken while unwinding, which could be pattern matched in lit.  
>> The output format is entirely controlled by the tool, and not by the 
>> unwinder itself, so it would be stable in the face of changes to the 
>> underlying unwinder.  Could have various options to enable or disable 
>> features of the unwinder in order to force the unwinder into modes that can 
>> be tricky to encounter in the wild.
>>      • lldb-symbol - A tool for testing symbol resolution.  Could have 
>> options for testing things like:
>>              • Determining if a symbol matches an executable
>>              • looking up a symbol by name in the debug info, and mapping it 
>> to an address in the process.  
>>              • Displaying candidate symbols when doing name lookup in a 
>> particular scope (e.g. while stopped at a breakpoint).
>>      • lldb-breakpoint - A tool for testing breakpoints and stepping.  
>> Various options could include:
>>              • Set breakpoints and out addresses and/or symbol names where 
>> they were resolved to.
>>              • Trigger commands, so that when a breakpoint is hit the tool 
>> could automatically continue and try to run to another breakpoint, etc.
>>              • options to inspect certain useful pieces of state about an 
>> inferior, to be matched in lit. 
>>      • lldb-interpreter - tests the jitter etc.  I don’t know much about 
>> this, but I don’t see why this couldn’t be tested in a manner similar to how 
>> lli is tested.
>>      • lldb-platform - tests lldb local and remote platform interfaces.
>>      • lldb-cli -- lldb interactive command line.
>>      • lldb-format - lldb data formatters etc.
> 
> 
> I agree that testing more functionality it a good idea, but if it is worth 
> testing we should see if we can get it into our public API. If so, then we 
> test it there. If not, then we come up with another solution. Even in the 
> alternate solution, it will be something that will probably create structured 
> data (JSON, Yaml, etc) and then parsed, so I would rather spend the time 
> getting these things into out public APIs, or test them vai our public APIs.
> 
>>      • Tests NOW, not later.
>>              • I know we’ve been over this a million times and it’s not 
>> worth going over the arguments again.  And I know it’s hard to write tests, 
>> often requiring the invention of new SB APIs.  Hopefully those issues will 
>> be addressed by above a) and b) above and writing tests will be easier.  
>> Vedant Kumar ran some analytics on the various codebases and found that LLDB 
>> has the lowest test / commit ratio of any LLVM project (He didn’t post 
>> numbers for lld, so I’m not sure what it is there).
>>                      • lldb: 287 of the past 1000 commits
>>                      • llvm: 511 of the past 1000 commits
>>                      • clang: 622 of the past 1000 commits
>>                      • compiler-rt: 543 of the past 1000 commits
>> This is an alarming statistic, and I would love to see this number closer to 
>> 50%.
>>      • Code style / development conventions - Aside from just the column 
>> limitations and bracing styles, there are other areas where LLDB differs 
>> from LLVM on code style.  We should continue to adopt more of LLVM's style 
>> where it makes sense.  I've identified a couple of areas (incomplete list) 
>> which I outline below.  
>>              • Clean up the mess of cyclical dependencies and properly layer 
>> the libraries.  This is especially important for things like lldb-server 
>> that need to link in as little as possible, but regardless it leads to a 
>> more robust architecture, faster build and link times, better testability, 
>> and is required if we ever want to do a modules build of LLDB
>>              • Use CMake instead of Xcode project (CMake supports 
>> Frameworks).  CMake supports Apple Frameworks, so the main roadblock to 
>> getting this working is just someone doing it.  Segmenting the build process 
>> by platform doesn't make sense for the upstream, especially when there is a 
>> perfectly workable solution.  I have no doubt that the resulting Xcode 
>> workspace generated automatically by CMake will not be as "nice" as one that 
>> is maintained by hand.  We face this problem with Visual Studio on Windows 
>> as well.  The solution that most people have adopted is to continue using 
>> the IDE for code editing and debugging, but for actually running the build, 
>> use CMake with Ninja.  A similar workflow should still be possible with an 
>> OSX CMake build, but as I do not work every day on a Mac, all I can say is 
>> that it's possible, I have no idea how impactful it would be on peoples' 
>> workflows.  
>>              • Variable naming conventions
>>                      • I don’t expect anyone is too fond of LLDB’s naming 
>> conventions, but if we’re committed to joining the LLVM ecosystem, then 
>> let’s go all the way.
> 
> Hopefully we can get LLVM and Clang to adopt naming member variables with a 
> prefix... If so, that is my main remaining concern.
> 
>>              • Use more modern C++ and less C
>>                      • Old habits die hard, but this isn’t just a matter of 
>> style.  It leads to safer, more robust, and less fragile code as well.
>>              • Shorter functions and classes with more narrowly targeted 
>> responsibilities
>>                      • It’s not uncommon to find functions that are hundreds 
>> (and in a few cases even 1,000+) of lines long.  We really need to be better 
>> about breaking functions and classes down into smaller responsibilities.  
>> This helps not just for someone coming in to read the function, but also for 
>> testing.  Smaller functions are easier to unit test.
>>              • Convert T foo(X, Y, Error &error) functions to Expected<T> 
>> foo(X, Y) style (Depends on 1.c)
>>                      • llvm::Expected is based on the llvm::Error class 
>> described earlier.  It’s used when a function is supposed to return a value, 
>> but it could fail.  By packaging the error with the return value, it’s 
>> impossible to have a situation where you use the return value even in case 
>> of an error, and because llvm::Error has mandatory checking, it’s also 
>> impossible to have a sitaution where you don’t check the error.  So it’s 
>> very safe.  
> 
> As crashes if you don't check the errors. One of the big differences between 
> LLDB and LLVM/Clang is that asserts are used liberally all over clang which 
> make the code very crashy when used in production with uncontrolled input 
> like we get in the debugger. We will need to be very careful with any changes 
> to make sure we don't increase crash frequency.
> 
>> 
>> Whew.  That was a lot.  If you made it this far, thanks for reading!
>> 
>> Obviously if we were to embark on all of the above, it would take many 
>> months to complete everything.  So I'm not proposing anyone stop what 
>> they're doing to work on this.  This is just my own personal wishlist
> 
> There are many great things in here. As long as we are careful to not 
> increase the crash frequency in LLDB I am all for it. My mains areas of 
> concern are:
> - public API can't change in its current form
> - no LLVM or clang classes in the public API as arguments or return values.
> - don't crash more by introducing assert heavy code into code paths that use 
> input from outside sources 

It seems to me that you are mixing two things: asserting on user inputs vs 
asserting on internal invariants of the system. LLVM is very intensive about 
asserting on the second category and this seems fine to me.
Asserting on external inputs is not great (in LLVM as much as in LLDB).

The asserting error class above falls into the second category and is a great 
tool to enforce programmer error

— 
Mehdi

_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] LLDB Evolution - Final Form

Reply via email to