Lars Volker has posted comments on this change. Change subject: IMPALA-3489: Add script to extract breakpad symbols from binaries ......................................................................
Patch Set 4: (28 comments) http://gerrit.cloudera.org:8080/#/c/2961/4/bin/dump_breakpad_symbols.py File bin/dump_breakpad_symbols.py: Line 1: #!/usr/bin/env python > Is there a reason not to use impala-python instead? The script didn't need anything impala specific (at least until now, before addressing the comments) so I thought it would be nice for it not to depend on the impala env. I changed this, however I also needed to add python-magic (the package) to our requirements. All in all I would still prefer the script to be as self-contained as possible. Let me know which way of handling the dependency (python-magic) seems preferable to you. Line 17: # in Google breakpad > Nit: Breakpad Done Line 19: # - Scan an impala build dir for ELF files > Nit: Impala Done Line 22: # - Extract an rpm and corresponding debuginfo rpm file > This is specific to Impala / CDH RPMs, right? Please clear this up since de Done Line 29: # - Goodle breakpad, either installed via the impala toolchain or separately > s/Goodle/Google/ Done Line 37: # ./dump_breakpad_symbols -d /tmp/syms -r tmp/impala-2.5.0+cdh5.7.0+0-1.cdh5.7.0.p0.147.el6.x86_64.rpm -s tmp/impala-debuginfo-2.5.0+cdh5.7.0+0-1.cdh5.7.0.p0.147.el6.x86_64.rpm > Very long line. Please wrap with \ chars. Done, however I still needed the leading #, so copy and paste won't work anymore. Line 47: # $IMPALA_TOOLCHAIN/breakpad-*/bin/minidump_stackwalk /tmp/impala-minidumps/impalad/03c0ee26-bfd1-cf3e-43fa49ca-1a6aae25.dmp /tmp/syms > Another long line. Done Line 61: Binary = namedtuple("Binary", "path, debug_path") > This is more than just a "Binary", right? It contains a mapping of a binary Sure. How about BinaryDebugInfo? BinaryDebugTuple? BinaryDebugPair? Line 64: def log(msg): : """Log a message to stdout.""" : print msg > Do you need this? Why not use the logging module? A lot of our python scrip Done Line 71: sys.stderr.write("ERROR: %s\n" % msg) > If you used the logging module, this could be an error-level message. Done Line 76: breakpad > nit: capital B Done Line 86: dump_syms = glob.glob(os.path.join(toolchain, "breakpad-*", "bin", "dump_syms")) > This seems risky. A long-lived working copy of Impala will have a toolchain Done, see top comment. Line 100: Path to a directory containing results : from an impala build, e.g. be/build/debug > I think this will make the help message look strange, "from an impala..." w Seems to work fine. My guess would be argparse calls textwrap.dedent() on the string. Here's how it looks for me: -b BUILD_DIR, --build_dir BUILD_DIR Path to a directory containing results from an Impala build, e.g. be/build/debug Line 112: # Post processing checks > Maybe also verify that, for example, if binary_files were specified then st Done Line 126: if e.errno != errno.EEXIST: > I think you should also check that path is a directory. Maybe rewrite like Done Line 136: def is_file(path): > is_regular_file ? Done Line 142: """Check whether 'path' is an ELF file.""" > How about adding the is_file check here? Done Line 167: result = [] > Not used. Done Line 176: binary_base = os.path.join(tmp_dir, 'usr', 'lib', 'impala') : debuginfo_base = os.path.join(tmp_dir, 'usr', 'lib', 'debug', 'usr', 'lib', 'impala') > It might be good to make 'usr/lib/impala' and 'usr/lib/debug/usr/lib/impala Done Line 193: if (args.binary_files): > I don't believe the parens are necessary. Done Line 195: if args.stdin_files: > It seems like this chain could all have elif instead. Done Line 201: return [] > The script should not get to this point, maybe some kind of error message s Done Line 227: with open(tmp_file, 'r') as f: > The tmp_fd is never closed before you reopen the same file. One the temporary file object in 218 is destroyed the underlying fd is closed: http://stackoverflow.com/questions/7114059/does-closing-a-file-opened-with-os-fdopen-close-the-os-level-fd I ran the test program from there to confirm. Line 230: _, _, _, id, binary = header.split(' ') : out_path = os.path.join(out_dir, binary, id) > The use of the 'id' identifier here technically masks the Python builtin fu Yes. I wasn't sure what our policies are on this. Renamed it. Line 235: except: > At the very least this should be: Done. I learned about BaseException on the way. :) Line 237: os.remove(tmp_file) > Catch EnvironmentError here and...I dunno, print an error and move on? Done Line 238: raise > Assuming you make the L235 change... Done Line 246: success = True > maybe change this to status = 0, then after the loop you can do: sys.exit(s Done -- To view, visit http://gerrit.cloudera.org:8080/2961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ee0972efcb50609407b04cd6f4309b244a84861 Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
