> On Aug 3, 2018, at 6:17 PM, Leonard Mosescu <[email protected]> wrote:
>
> Greg, Mark,
>
> Looking at the code, LLDB falls back to a full file crc32 to create the
> module UUID if the ELF build-id is missing. This works, in the sense that the
> generated UUID does indeed identify the module.
>
> But there are a few problems with this approach:
>
> 1. First, runtime performance: a full file crc32 is a terribly inefficient
> way to generate a temporary UUID that is basically just used to match a local
> file to itself.
> - especially when some unstripped binaries can be very large. for example a
> local chromium build produces a 5.3Gb chrome binary
> - the crc32 implementation is decent, but single-threaded
> - to add insult to the injury, it seems a small bug defeats the intention to
> cache the hash value so it ends up being recalculated multiple times
>
> 2. The fake UUID is not going to match any external UUID that may be floating
> around (and yet not properly embedded into the binary)
> - an example is Breakpad, which unfortunately also attempts to make up UUIDs
> when the build-id is missing (something we'll hopefully fix soon)
>
> Is there a fundamental reason to calculate the full file crc32? If not I
> propose to improve this based on the following observations:
>
> A. Model the reality more accurately: an ELF w/o a build-id doesn't really
> have an UUID. So use a zero-length UUID in LLDB.
> B. The full file name should be enough to prove the identity of a local
> module.
> C. If we try to match an external UUID (ex. from a minidump) with a local
> file which does not have an UUID it may help to have an option to allow it to
> match (off by default, and only if there's no better match)
>
> What do you think?
I am fine with all the above except some reservations about case C. No need to
calculate something if it isn't useful. For case C it should be fine to never
match as if a file has a UUID to begin with it typically isn't something that
gets stripped in a stripped binary. So we should either have it or not. If
breakpad does calculate a CRC32, then we need to know to ignore the UUID. The
problem is we probably won't be able to tell what the UUID is: real from build
ID, or from GNU debug info CRC, or CRC of entire file. So the minidump code
will need to do something here. If a minidump has the linux auxv and memory map
in them, then we might need to dig through the section information and deduce
if a file matches or not based off the size of mapped program headers to
further help with the matching.
One other idea is to make a set of enumerations for the UUID type:
class UUID {
enum class Type {
BuildID, // A build ID from the compiler or linker
GNUDebugInfoCRC, // GNU debug info CRC
MD5, // MD5 of entire file
MD5NonDebug, // MD5 of the non debug info related bits
CRC32, // CRC32 of entire file
Other, // Anything else
};
};
The eTypeMD5NonDebug is what apple does: it MD5 checksums only the parts of the
file that don't change with debug info or any paths found in debug info or
symbols tables. So if you build a binary in /tmp/a or in
/private/var/local/foo, the UUID is the same if the binary is essentially the
same (code, data, etc).
Then we can make intelligent comparisons between UUID types. Might even be
possible for a module to have more than 1 UUID then if a binary contains a
eTypeBuildID and a eTypeGNUDebugInfoCRC. If a tool stores its UUIDs as a CRC32
or MD5, then those can be calculated on the fly. The GetUUID on
lldb_private::Module might become:
const lldb_private::UUID &Module::GetUUID(UUID::Type uuid_type);
Thoughts?
Greg
>
> Thanks,
> Lemo.
>
_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev