Dima Kogan <[email protected]> writes: > I took a pass through the code. It's now of candidate-mergeable quality. > As before, the tree appears in
Sorry this took so long again. I'm a fresh father and things have been hectic the last two weeks or so. > https://github.com/dkogan/ltrace/tree/libsym_aliases_inexportlist 99fe555 (protolib_lookup() now uses a dict* instead of a function that returns it) changes meaning of the code and breaks a lot of test cases. The complexity is in fact necessary. I took 3afa8a5552 and d2091e6f6ef (with a twist, see below). > As usual, some comments and things that may need fixing before this is > pushed: > > 1. I added a integer hash function for uint64_t keys, but kept the math > the same as that for the 32-bit keys. Computing an appropriate constant > is easy, I just haven't done it yet. I added a quick patch that calculates the hash by xor-ing the hashes of the two 32-bit components. At least now the upper half is not completely wasted. Feel free to replace this with whatever you come up with, when you do. Regarding 6db578849 (We now use known prototypes...): Names that start with underscore (_dtor_string, _clone_vect, _dtor_vect) are reserved, we shouldn't use them in ltrace itself. Don't use logical operations in integer context (i.e. where "int" carries an actual value, not a boolean). Instead of this: +static int +library_exported_names_clone(struct library_exported_names *retp, + const struct library_exported_names *names) +{ + return + DICT_CLONE(&retp->names, &names->names, + const char*, uint64_t, + dict_clone_string, _dtor_string, + NULL, NULL, + NULL) || + DICT_CLONE(&retp->addrs, &names->addrs, + uint64_t, struct vect*, + NULL, NULL, + _clone_vect, _dtor_vect, + NULL); +} write this: +static int +library_exported_names_clone(struct library_exported_names *retp, + const struct library_exported_names *names) +{ + if (DICT_CLONE(&retp->names, &names->names, + const char*, uint64_t, + dict_clone_string, _dtor_string, + NULL, NULL, + NULL) < 0 + || DICT_CLONE(&retp->addrs, &names->addrs, + uint64_t, struct vect*, + NULL, NULL, + _clone_vect, _dtor_vect, + NULL) < 0) + return -1; + else + return 0; +} Apart from arbitrary measure of purity, there's an argument of consistency: X || Y returns 1 or 0 depending on nul-ness of its operands, but in ltrace, we signal error by returning 0 for success and a negative value for failure. + int result = DICT_INSERT(&names->names, &name, &addr); + if(result == 1) { This should read: + int result = DICT_INSERT(&names->names, &name, &addr); + if (result > 0) { This: + struct vect **paliases = DICT_FIND_REF(&names->addrs, + &addr, struct vect*); ... technically maps an address to a pointer to a vector of names. We could actually store vectors in the hash table directly. If you would be willing to make this change, that would be great, but I'm not going to push it ;) I'd like this: + result = DICT_INSERT(&names->addrs, &addr, &aliases); + if(result != 0) + return result; ... be written as: + result = DICT_INSERT(&names->addrs, &addr, &aliases); + assert(result <= 0); + if (result < 0) + return result; ... to make it clear that we really don't expect result of > 0. This: + result = vect_pushback(aliases, &namedup); + if(result != 0) + return result; ... should free(namedup). This: +bool library_exported_names_each(const struct library_exported_names *names, + enum callback_status (*cb)(const char *, + void *), + void *data) +{ + struct library_exported_names_each_context context = + {.inner_cb = cb, + .data = data, + .failure = false}; + DICT_EACH(&names->names, + const char*, uint64_t, + NULL, library_exported_names_each_cb, &context); + return !context.failure; +} ... should rather be: +const char ** +library_exported_names_each(const struct library_exported_names *names, + const char **start_after, + enum callback_status (*cb)(const char *, + void *), + void *data) +{ + struct library_exported_names_each_context context = + {.inner_cb = cb, + .data = data, + .failure = false}; + return DICT_EACH(&names->names, const char*, uint64_t, + start_after, library_exported_names_each_cb, &context); +} Iterators in ltrace are generally restartable, and here it's not even very difficult to code the restartability in. (Though I didn't test this, so possibly I'm missing something.) +bool library_exported_names_each_alias( Same thing here as well, const char ** as well, I think. Other than that, there's a number of instances of if not being followed by a space. Same holds for while and for, though I don't recall having seen any offenders. 503e63372 (DWARF prototypes) looks good to me. > 2. The name is still "exported_names". As before, this is > unconditionally filled in. Again, as before, there's a "symbols" field > that IS populated conditionally, and generally has way more "stuff" in > it. These two fields can be merged, but I haven't attempted that. I > can't think of a better name than "exported_names". > "exported_names_addrs" ? I don't really mind the name. It's a kind of historical baggage, and would be nice to merge into one over-arching symtab, but it's not a big deal I guess. > 3. The data structure currently is not ideally efficient. I'm using two > dicts. One to map symbol names -> addrs. And another to map an addr -> > vect(symbol names). This is fine for lookups (I'm assuming the memory > usage is reasonable), but is poor for iteration, which we need for > activate_latent_in(). How bad is it? Ideally one of the hashes would be > a tree of some sort, but there's no such structure in ltrace. I can also > add a 3rd structure that's a linked list used just for this iteration. > I'm tempted to just leave it as is. Hmm, I see what you mean. We do O(n) iteration over alias list and then for each of those we do O(n) iteration over list of symbols. We could store struct library::symbols in a hash table as well, but there are places in ltrace that assume it's a linked list--elf_add_plt_entry, delete_symbol_chain, and likely more. So it's rather invasive change. Why don't we turn the logic around? Go through the list of symbols, and for each of them, consult the vector of aliases that we found out earlier. You end up comparing each symbol with the same set of select few strings (typically one), which will be fast in practice, as those strings will likely stay in cache. We could implement name interning as well, which would avoid the round-trip through strcmp. But I'm rather skeptical about these sorts of performance tweaks. Kernel wastes so much time context-switching between ltrace and the tracee anyway, that these things are noise, I suspect. But if you have performance numbers, I'm not against it. > 4. I discovered that in the libc on my machine (Debian/sid amd64) some > symbols appear at multiple addresses: [...] > Those are versioned symbols, with different implementation for different > libc versions. This same-symbol-different-address idea doesn't fit into > the data structures, as I've currently defined them. I explicitly ignore > any such repetitions to avoid printing out useless errors. Again, I'm > tempted to leave it as is, since there just aren't many of such > functions, and it doesn't sound worth it to me to deal with it. ltrace doesn't really handle versions at all, except it knows to ignore them (see e.g. populate_this_symtab in ltrace-elf.c). Generally it's of course meaningful to promote versions to first-class thing, or at least fudge it somehow, because it may be useful to provide different prototypes for different versions of the same symbol. Besides, users might want to see what symbol version ends up servicing a call. But right now your approach makes sense. Thank you, PM _______________________________________________ Ltrace-devel mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/ltrace-devel
