The Paradyn project has two new technical reports, one in the area of binary code analysis and the other MRNet scalability.
Our full list of project publications always can be found at: http://www.paradyn.org/html/publications-by-year.html Comments and feedback on these papers is always welcome! ------------------------------------------------------------------------------ "Binary Code Is Not Easy", Xiaozhu Meng and Barton P. Miller Submitted for publication, March 2015. ftp://ftp.cs.wisc.edu/paradyn/papers/Meng15Parsing.pdf Abstract" Binary code analysis is an enabling technique for many applications. Binary code analysis tool kits provide several capabilities to automatically analyze binary code, including decoding machine instructions, building control flow graphs of the program, and determining function boundaries. Modern compilers and run-time libraries have introduced significant complexities to binary code. These complexities negatively affect the capabilities of binary analysis tool kits, which may cause tools to report inaccurate information about binary code. Analysts may hence be confused and applications based on binary analysis tool kits may have degrading quality. We examine the problem of constructing control flow graphs from binary code and labeling the graphs with accurate function boundary annotations. We identify eight challenging code constructs that represent hard-to-analyze aspects of the code from modern compilers, and show code examples that illustrate each code construct. As part of this discussion, we discuss how the new code parsing algorithms in the open source Dyninst tool kit support these constructs. In particular, we present a new model for describing jump tables that improves our ability to precisely determine the control ow targets, a new interprocedural analysis to determine when a function is non-returning, and techniques for handling tail calls, code overlapping between functions, and code overlapping within instructions. We created test binaries patterned after each challenging code construct and evaluated how tool kits, including Dyninst, fare when parsing these binaries. ------------------------------------------------------------------------------ "Mr. Scan: A Hybrid/Hybrid Extreme Scale Density Based Clustering Algorithm", Benjamin Welton and Barton Miller, submitted for publication, March 2015. ftp://ftp.cs.wisc.edu/paradyn/papers/Welton15MrScan.pdf Abstract: Density-based clustering algorithms are a widely-used class of data mining techniques that can find irregularly shaped clusters and cluster data without prior knowledge of the number of clusters the data contains. DBSCAN is the most well-known density-based clustering algorithm. We introduce our extension of DBSCAN, called Mr. Scan, which uses a hybrid/hybrid parallel implementation that combines the MRNet tree-based distribution network with GPU-equipped nodes. Mr. Scan avoids the problems encountered in other parallel versions of DBSCAN, such as scalability limits, reduction in output quality at large scales, and the inability to effectively process dense regions of data. Mr. Scan uses effective data partitioning and a new merging technique to allow data sets to be broken into independently processable partitions without the reduction in quality or large amount of node-to-node communication seen in other parallel versions of DBSCAN. The dense box algorithm designed as part of Mr. Scan allows for dense regions to be detected and clustered without the need to individually compare all points in these regions to one another. Mr. Scan was tested on both a geolocated Twitter dataset and image data obtained from the Sloan Digital Sky Survey. In testing Mr. Scan we performed end-to-end benchmarks measuring complete application run time from reading raw unordered input point data from the file system to writing the final clustered output to the file system. The use of end-to-end benchmarking gives a clear picture of the performance that can be expected from Mr. Scan in real world use cases. At its largest scale, Mr. Scan clustered 6.5 billion points from the Twitter dataset on 8,192 GPU nodes on Cray Titan in 7.5 minutes ------------------------------------------------------------------------------ _______________________________________________ Dyninst-api mailing list [email protected] https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
