That's the basics, and that's indeed how dwarf_comparator works (depth first).
> - The attribute values can also be references to DIEs. This might be new > DIEs, but also DIEs "up" in the DIE children tree. This is why the DIE > tree is actually a (pointed) graph. This is indeed the only reason the whole thing is hairy at all. > - For two DIEs to be equal the attribute values of the attributes with > the same key names that have references to DIEs as value should: > a) Be both not yet seen DIEs in the respective trees so far, and > be equal DIEs (in this case we can just treat the attribute > reference as we would a child DIE, compare it in order). > Or: > b) Be both already seen DIEs in the respective trees so far, and be > in the "same location". I don't understand the "already seen" or not distinction here. Whether they are forward references or not doesn't matter in any way I can think of (except in how you'd implement it, of course). Regardless, you need the subtrees to be equal and their locations to be equivalent. > The "same location" means that the references to the DIEs in the trees > are at the same place if we would walk the tree in the order chosen > above. Approximately, yes. I've called this "equivalent context" rather than "same location" to abstract it a little. The definition for "exact equivalence" I mean is that each parent DIE has matching attributes and equivalent parents. That's a recursive definition, so it iterates on up to the root of the tree (the CU). For the overall goal of maximal sharing constrained only by the real high-level semantic needs to avoid wrong conflations, we'll want some flexibility to ignore some kinds of attribute differences. This decision is encoded in dwarf_comparator::equal_enough. The current definition of "equal enough" is that we ignore the reference attributes. That's as much as anything else just because another vector of complex equality checking with references would just make things even harder to think about. In practice, I'm pretty sure the differences between "context" DIEs that matter semantically are not in their references. But, in the abstract, if it were easy to implement, then the ideal place to start would be "exact equivalence", references and all, and whittle down from there. > To keep the order of the nodes I would keep two stacks of DIEs that > represent the path traveled so far. That way whether or not a DIE has > already been seen is just whether it is already on the stack, and the > "location" can just be expressed as the index of that DIE into the > stack. The comparator does a depth-first recursive traversal. So it has a live call frame associated with each DIE currently under consideration. Remembering anything beyond that is done by the tracker. The comparison constructs a tracker::step object when it considers a pair of DIEs. This object and its destructor are what trigger the tracker code to record where we are right now. It tracks each of the two parallel walks with a dwarf_path_finder. Each of these holds a a stack of DIEs, called a die_path in the code. A step on the walk (tracker::step -> dwarf_path_finder::step) pushes the new DIE on the stack. It also records the whole current stack in a map keyed by that DIE's identity. Looking these up are how we can consider the "context" when we have an attribute reference to this DIE later on. > Does any of the above make sense? Is this what the comparator/tracker > logic does? Are there wrinkles in the above logic that make it more > involved? What kind of caching should be added to make the above > easier/better/faster? You've asked a lot and we'll have to get to it a bit at a time. I'm not sure I followed exactly what you meant about a stack or how it relates to the tracking we have going on now. The definition of "equal" is recursive, in that for two DIEs to be equal their references have to be to equal DIEs. Yet the referenced DIEs may have references of their own that have to be compared. After one step or many steps, references could lead back to the first DIE. Not only do you have to terminate, but two DIEs with congruently circular references must be considered equal. If you follow forward references immediately to answer the equality question, you will cover lots of the tree out of order relative to the main comparison walk. If you did no caching, you would repeat many subtree comparisons many many times. So the tracker also records equivalences. If you compare two subtrees once in the main walk, you want to record that they were or weren't equal, so that you know that answer immediately when you later come across a need to compare reference attributes pointing to each. Likewise, if you follow a forward reference and compare referenced subtrees, you want to record that result so that when you come across that subtree later in the main walk (or again via another reference) then you don't repeat that comparison. tracker::reference_match is an untidy mashup of both the attempts at handling circularities and the caching of comparison results. It isn't doing either correctly. Thanks, Roland _______________________________________________ elfutils-devel mailing list [email protected] https://fedorahosted.org/mailman/listinfo/elfutils-devel
