[lldb-dev] New DWARF parser proposal

Richard Mitton Fri, 30 Aug 2013 13:29:41 -0700

DWARF parsing is currently very slow. On my machine, loading the 'clang'binary into lldb takes 14 seconds (vs. gdb's 29 seconds). The actual I/Ocost of reading that much data is only around 2 seconds.

The DWARF parser has to read the entire .debug_info section into memoryand parse it. It would be great if we didn't have to do this. OS X hasthe .apple_names section et al, which allow lldb to automatically havean index without having to parse anything.

However, this is an Apple extension and does not exist on otherplatforms. There are a bunch of accelerator tables that the DWARF specallows for, but they're all unusable. pubnames is either absent orincomplete, aranges might not be present, and if they were generated bythe compiler rather than the linker, then they may not cover all theobject files in the binary (causing lldb to miss symbols). So we end upin the situation we have now, where we cannot use or trust theaccelerators, and have to parse everything anyway to build our own index.

I believe lldb does this the wrong way. Performing just a simple "bmain" test, it will touch the entire debug_info section *4 times* (atleast), which on the clang binary example is 600MB of data each time:


- pass 1 (extract DIEs)
- pass 2 (index DIEs)
- pass 3 (extract DIEs)
- pass 4 (build arange tables)

I believe the key problems of the current design are:

1) lldb tries to build it's own DIE array copy, rather than justreferring to the existing data in-place. This adds a significantmanagement overhead to all functions accessing this data.2) lldb goes to great efforts to avoid reading the entire debuginformation (even though it will ultimately need to anyway) and to avoidkeeping it in memory. This in fact causes it to *reload* it severaltimes, as each further operation performs lazy initialization and causesa re-parse.

If we just accepted that we are forced to load all the data once, itwould actually be faster. My suggestion therefore is to write anoptimized single-pass DWARF indexer to replace the current DWARF loader,with the following properties:

- Always read debug info exactly once upon module load (unless we canguarantee apple extensions are used).- Use the entire debug_info section in-place, without trying to build acopy. Not having separate stages for extraction and indexing will allowefficient data traversal.- Make use of the abbreviation tables to pre-build decoder structuresfor each DWARF tag type. In most cases we can know the size of each DIEthe moment we read it's abbreviation code in, and can skip in oneoperation if needed without having to parse the elements. Because we runin one pass, we never have to even look at DIEs we don't need.- Track the parent scope as we go, on a stack, so we don't have to keepdoing lookups which walk the DIE tree. The current parser walks up thetree to find what scope it's in, even though it already parsed theparent scope container.- Build arange tables automatically as we go, ignoring any that mightalready be present. We have already touched and extracted the range dataanyway, it would be trivial to build an accelerator table for free.- For strings, we should pre-pool the DWARF string table once up-front,to avoid repeatedly pooling strings for each DIE.

With this approach we use the DIE data as-is in memory, without havingto make our own copy.Parent chains should ideally only be used during parsing. Ifparents/siblings are really needed after the initial parse, one easysolution would be to just store that in a separate hash table.

I welcome discussion on this. I think it's important for lldb to nothave any delays on loading programs, and as we cannot control what thecompilers will supply to us, we have to address this on our end.


--
Richard Mitton
[email protected]

_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

[lldb-dev] New DWARF parser proposal

Reply via email to