Thanks for the detailed response David. On Fri, Apr 9, 2021 at 2:52 PM David Blaikie <dblai...@gmail.com> wrote:
> I'm not suggesting scanning all of .debug_info - only the CU DIE for > DW_AT_ranges or high/low_pc, then skip to the next CU DIE (via the > unit header's next unit offset). > > It sounded like CU ranges couldn't be used to build such an index at > all/that your code used quite a different strategy in the absence of > aranges? (rather than building the index from the CU ranges - somewhat > slower I'm sure, but I wouldn't've thought (& am trying to understand > if it is/why) so fundamentally slower that it wouldn't be the next > fallback rather than skipping the index entirely or employing some > more fundamentally different approach) This is still significantly less dense than aranges, involves more disk I/O and memory pressure. Let me see what optimizations I can implement here and get back to you with the results / what I came up with. This would be a better basis for apples to apples comparison. > > If you mean building ranges from all the DIEs deep inside a CU - yeah, > that's going to be fundamentally slower in a bunch of ways that maybe > I could see that would necessitate a totally different approach/that > the index wouldn't make sense anymore (though I'd still like to > understand it) - but I'm especially curious about the case where the > CU DIE itself does have comprehensive address range information. > Will report back on this. > > - Dave > > > > >> > >> > >>> > >>> (+ complexities Greg mentions later in the thread). In cases where we > lack this, we use our own persistent cache which introduces unnecessary > complexity. Now I am considering going as far as adding a multi-threaded > indexer for cases where a persistent cache / build system modifications > aren't an option (work to begin in the next week or two). > >>> > >>> .debug_aranges would provide a lot of value to our users. > >>> > >>> On Thu, Mar 11, 2021 at 3:48 PM David Blaikie via Dwarf-Discuss < > dwarf-discuss@lists.dwarfstd.org> wrote: > >>>> > >>>> On Thu, Mar 11, 2021 at 5:48 AM <paul.robin...@sony.com> wrote: > >>>>> > >>>>> Hopefully not to side-track things too much... maybe wants its own > >>>>> thread, if there's more to debate here. > >>>> > >>>> > >>>> Yeah, how about we spin it off into another thread (done here) > >>>> > >>>>> > >>>>> >> For the case you suggested where it would be useful to keep the > range > >>>>> >> list for the CU in the .o file, I think .debug_aranges is what > you're > >>>>> >> looking for. > >>>>> > > >>>>> > aranges has been off by default in LLVM for a while - it adds a > lot of > >>>>> > overhead (doesn't have all the nice rnglist encodings for instance > - > >>>>> > nor can it use debug_addr, and if it did it'd still be duplicate > with > >>>>> > the CU ranges wherever they were). > >>>>> > >>>>> Did you want to file an issue to improve how .debug_aranges works? > >>>> > >>>> > >>>> I don't currently understand the value it provides, and I at least > don't have a use case for it, so I'm not sure I'd be the best person to > advocate/drive that work. > >>>> > >>>>> Complaining that it duplicates CU ranges is missing the point, > though; > >>>>> it's an index, like .debug_names, of course it duplicates other info. > >>>>> If you want to suggest an improved index, like we did with > .debug_names, > >>>>> that would be great too. > >>>> > >>>> > >>>> .debug_names is quite different though - it collects information from > across the DIE tree - information that is expensive to otherwise gather > (walking the whole DIE tree). > >>>> > >>>> .debug_aranges is not like that for most producers (producers that do > include the address ranges on the CU DIE) - the data is readily available > immediately on the CU. That does involve reading some of .debug_abbrev, and > interpreting a handful of attributes - but at least for the use cases I'm > aware of, that overhead isn't worth the size increase. > >>>> > >>>> Do you have numbers on the benefits of .debug_aranges compared to > parsing the ranges from CU DIEs? > >>>> > >>>> (one possible issue: the CU doesn't /have/ to contain low/high/ranges > if its children DIEs contain addresses - having that as a guarantee, or > some preferred way of encoding zero length (high/low of 0 would be > acceptable, I guess) would be nice & make it cheap to skip over CUs that > don't have any address ranges) > >>>> > >>>> Roughly, a modern debug_aranges to me would look something like: > >>>> > >>>> <length> > >>>> <version> > >>>> <CU sec_offset> > >>>> <addr_base> > >>>> <rnglist sec_offset> > >>>> > >>>> So it could fully re-use the rnglist encoding. If this was going to > be as compact as possible, it'd need to be configurable which encodings it > uses - ranges V high/low, addrx V addr - at which point it'd probably look > like a small DIE with an inline abbrev (similar to the way DWARFv5 encodes > the file and directory entries now, and how debug_names is self-describing) > - at which point it looks to me a lot like parsing the CU DIEs. > >>>> > >>>> _______________________________________________ > >>>> Dwarf-Discuss mailing list > >>>> Dwarf-Discuss@lists.dwarfstd.org > >>>> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org > >>> > >>> > >>> > >>> -- > >>> Samy Al Bahra [http://repnop.org] > > > > > > > > -- > > Samy Al Bahra [http://repnop.org] > -- Samy Al Bahra [http://repnop.org]
_______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org