DWARF parsing is currently very slow. On my machine, loading the 'clang' binary into lldb takes 14 seconds (vs. gdb's 29 seconds). The actual I/O cost of reading that much data is only around 2 seconds.

The DWARF parser has to read the entire .debug_info section into memory and parse it. It would be great if we didn't have to do this. OS X has the .apple_names section et al, which allow lldb to automatically have an index without having to parse anything.

However, this is an Apple extension and does not exist on other platforms. There are a bunch of accelerator tables that the DWARF spec allows for, but they're all unusable. pubnames is either absent or incomplete, aranges might not be present, and if they were generated by the compiler rather than the linker, then they may not cover all the object files in the binary (causing lldb to miss symbols). So we end up in the situation we have now, where we cannot use or trust the accelerators, and have to parse everything anyway to build our own index.

I believe lldb does this the wrong way. Performing just a simple "b main" test, it will touch the entire debug_info section *4 times* (at least), which on the clang binary example is 600MB of data each time:

- pass 1 (extract DIEs)
- pass 2 (index DIEs)
- pass 3 (extract DIEs)
- pass 4 (build arange tables)

I believe the key problems of the current design are:
1) lldb tries to build it's own DIE array copy, rather than just referring to the existing data in-place. This adds a significant management overhead to all functions accessing this data. 2) lldb goes to great efforts to avoid reading the entire debug information (even though it will ultimately need to anyway) and to avoid keeping it in memory. This in fact causes it to *reload* it several times, as each further operation performs lazy initialization and causes a re-parse.

If we just accepted that we are forced to load all the data once, it would actually be faster. My suggestion therefore is to write an optimized single-pass DWARF indexer to replace the current DWARF loader, with the following properties:

- Always read debug info exactly once upon module load (unless we can guarantee apple extensions are used). - Use the entire debug_info section in-place, without trying to build a copy. Not having separate stages for extraction and indexing will allow efficient data traversal. - Make use of the abbreviation tables to pre-build decoder structures for each DWARF tag type. In most cases we can know the size of each DIE the moment we read it's abbreviation code in, and can skip in one operation if needed without having to parse the elements. Because we run in one pass, we never have to even look at DIEs we don't need. - Track the parent scope as we go, on a stack, so we don't have to keep doing lookups which walk the DIE tree. The current parser walks up the tree to find what scope it's in, even though it already parsed the parent scope container. - Build arange tables automatically as we go, ignoring any that might already be present. We have already touched and extracted the range data anyway, it would be trivial to build an accelerator table for free. - For strings, we should pre-pool the DWARF string table once up-front, to avoid repeatedly pooling strings for each DIE.

With this approach we use the DIE data as-is in memory, without having to make our own copy. Parent chains should ideally only be used during parsing. If parents/siblings are really needed after the initial parse, one easy solution would be to just store that in a separate hash table.

I welcome discussion on this. I think it's important for lldb to not have any delays on loading programs, and as we cannot control what the compilers will supply to us, we have to address this on our end.

--
Richard Mitton
[email protected]

_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Reply via email to