RS <[EMAIL PROTECTED]> writes: >>> real 0m57.768s >>> user 0m40.878s >>> sys 0m7.812s
>> This says: most of the time is spent in user-level code. >> Such code consists of: >> 1. your own code (main exe, libraries, etc.), and >> 2. "system" libraries (libc.dylib (or whatever it's called on OSX)) > > OK Thanks, so most of the time is in the user level, which it makes > sense because my code has many calls to 2D convolutions, linear algebra, > and other user defined code that one would expect take some time to run. Hmm. That would imply that gprof is not working for you ... You should always start with a simple test case so you have some confidence in your tools. Here is a "proof" that gprof works on my system: [BEGIN PROOF] $ cat spin.c #include <stdlib.h> void spin(int x) { for (int i = 0; i < x; ++i) ; } void call_spin(int x) { for (int i = 0; i < x; ++i) spin(x); } int main(int argc, char *argv[]) { int count = 1000; if (1 < argc) count = atoi(argv[1]); call_spin(count); return 0; } $ gcc --std=c99 spin.c -g -pg $ time ./a.out 40000 real 0m14.347s user 0m14.335s sys 0m0.005s $ gprof ./a.out Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 100.59 144.17 144.17 40000 0.00 0.00 spin 0.01 144.19 0.02 1 0.02 144.19 call_spin Note that total seconds is very close to 10 * 'user time'. This is the first time I see such discrepancy, and a quick google search didn't reveal anything obvious. It probably has to do with HZ=100 in Linux 2.4 but HZ=1000 in 2.6 kernels. Also note that number of calls to spin() and call_spin() is correct. granularity: each sample hit covers 2 byte(s) for 0.01% of 144.19 seconds index % time self children called name 0.02 144.17 1/1 main [2] [1] 100.0 0.02 144.17 1 call_spin [1] 144.17 0.00 40000/40000 spin [3] ----------------------------------------------- <spontaneous> [2] 100.0 0.00 144.19 main [2] 0.02 144.17 1/1 call_spin [1] ----------------------------------------------- 144.17 0.00 40000/40000 call_spin [1] [3] 100.0 144.17 0.00 40000 spin [3] ----------------------------------------------- All of the above also make perfect sense. [END PROOF] If the output on your system differes from the above (e.g. you get no samples and 0.0 for 'self'), then clearly: - something is wrong with your system (or with gprof on Mac OS X), and - looking at results from your own program is pointless, since a trivial test doesn't work. > But I don't see how then > one can in general use gprof, because most of the time one doesn't have > access to system libraries that were built with -pg. I mean, one can > not recompile libc.a every time one wants to use a profiler You don't need to recompile libc.a every time. You compile it once, call it libc_p.a, and arrange to link against it when doing profiling. The exact details of how this is done are system-specific, and I have zero experience with Mac OS; perhaps someone else can guide you better. > Well, I know for fact that my user defined code takes some time to run > (extensive use of loops and user defined convoluted functions). In that case my previous theory doesn't hold water, and you must seek alternative explanation. One possible explanation is that gprof is broken on your system (or on Mac OS X; or on specific revisions of Mac OS and gcc ...). Cheers, -- In order to understand recursion you must first understand recursion. Remove /-nsp/ for email. _______________________________________________ help-gplusplus mailing list help-gplusplus@gnu.org http://lists.gnu.org/mailman/listinfo/help-gplusplus