Author: tim.bunce
Date: Fri Oct 31 06:39:25 2008
New Revision: 579
Modified:
trunk/Changes
trunk/lib/Devel/NYTProf.pm
Log:
Major update to docs about clock sources, SMP, processor affinity etc.
Added docs for DB::finish_profile.
Modified: trunk/Changes
==============================================================================
--- trunk/Changes (original)
+++ trunk/Changes Fri Oct 31 06:39:25 2008
@@ -36,19 +36,28 @@
Callers and timing information for xsubs are now shown at the
bottom of the corresponding source file.
References to xsubs in reports now include a working link
- if the xsub is in a package that contains perl code.
+ if the xsub is in a package that contains profiled perl code.
The html global subroutine index pages no longer list subs that
were never called.
Assorted report formating enhancements thanks to Gisle Aas.
- Added Devel::NYTProf::ReadStream module thanks to Gisle Aas.
-
Exclusive and Inclusive time column positions have been switched
to be consistent with how the times are presented elsewhere.
nytprofhtml includes a --open option to open the generated html
+
+Documentation:
+
+ Greatly expanded description of the clocks used for profiling
+ and thier issues, especially on multi-processor systems.
+
+Other:
+
+ Added Devel::NYTProf::ReadStream module which provides a perl
+ interface for reading the raw profile data, thanks to Gisle Aas.
+
=head2 Changes in Devel::NYTProf 2.05 (svn r498) 8th Oct 2008
Modified: trunk/lib/Devel/NYTProf.pm
==============================================================================
--- trunk/lib/Devel/NYTProf.pm (original)
+++ trunk/lib/Devel/NYTProf.pm Fri Oct 31 06:39:25 2008
@@ -331,15 +331,8 @@
happen so rarely relative to the activity of a most applications that you'd
have to run the code for many hours to have any hope of reasonably useful
results.
-It may be possible to use the C<clock=N> option to select a
-high-resolution cpu time clock. You can find the clocks available
-on you system using a command like:
-
- grep -r 'define *CLOCK_' /usr/include
-
-Look for a group that includes CLOCK_REALTIME. Documentation on these
clocks
-can be hard to find. I've not tried using these clocks yet. If you try it,
-please let us know how it works out.
+A better alternative would be to use the C<clock=N> option to select a
+high-resolution cpu time clock, if available on your system.
=head2 file=...
@@ -358,14 +351,14 @@
Systems which support the C<clock_gettime()> system call typically
support several clocks. By default NYTProf uses CLOCK_MONOTONIC.
+
This option enables you to select a different clock by specifying the
-integer id of the clock (which may vary between systems). If the clock
-you select isn't available then CLOCK_REALTIME is used.
+integer id of the clock (which may vary between operating system types).
+If the clock you select isn't available then CLOCK_REALTIME is used.
-This is a wizardly option and best avoided unless you really know what
-you're doing and understand the subtle differences between the clocks.
+See L</CLOCKS> for more information.
-=head1 SELECTIVE PROFILING
+=head1 RUN-TIME CONTROL OF PROFILING
You can profile only parts of an application by calling
DB::enable_profile()
and DB::disable_profile() at the appropriate moments.
@@ -373,6 +366,10 @@
Using the C<start=no> option lets you leave the profiler disabled until the
right moment, or circumstances, are reached.
+You can finish profiling completely by calling DB::finish_profile().
+This may be useful if perl is exiting abnormally, leaving the profile data
file
+in an incomplete state,
+
=head1 REPORTS
The L<Devel::NYTProf::Data> module provides a low-level interface for
loading
@@ -398,6 +395,116 @@
=back
+=head1 CLOCKS
+
+Here we discuss the way NYTProf gets high-resolution timing information
from
+your system and related issues.
+
+=head2 POSIX Clocks
+
+These are the clocks that your system may support if it supports the POSIX
+C<clock_gettime()> function. Other clock sources are listed in the
+L</Other Clocks> section below.
+
+The C<clock_gettime()> interface allows clocks to return times to
nanosecond
+precision. Of course few offer nanosecond I<accuracy> but the extra
precision
+helps reduce the cumulative error that naturally occurs when adding
together
+many timings. When using these clocks NYTProf outputs timings as a count
of 100
+nanosecond ticks.
+
+=head3 CLOCK_REALTIME
+
+CLOCK_REALTIME is typically the system's main high resolution 'wall clock
time'
+source. The same source as used for the gettimeofday() call used by most
kinds
+of perl benchmarking and profiling tools.
+
+If your system doesn't support clock_gettime() then NYTProf will use
+gettimeofday(), or the nearest equivalent,
+
+The problem with real time is that it's far from simple. It tends to drift
and
+then be reset to match 'reality', either sharply or by small adjustments
(via the
+adjtime() system call).
+
+Surprizingly, it can also go backwards, for reasons explained in
+http://preview.tinyurl.com/5wawnn
+
+=head3 CLOCK_MONOTONIC
+
+CLOCK_MONOTONIC rrepresents the amount of time since an unspecified point
in
+the past (typically system start-up time). It increments uniformally
+independent of adjustments to 'wallclock time'.
+
+=head3 CLOCK_VIRTUAL
+
+CLOCK_VIRTUAL increments only when the CPU is running in user mode on
behalf of the calling process.
+
+=head3 CLOCK_PROF
+
+CLOCK_PROF increments when the CPU is running in user I<or> kernel mode.
+
+=head3 CLOCK_PROCESS_CPUTIME_ID
+
+CLOCK_PROCESS_CPUTIME_ID represents the amount of execution time of the
process associated with the clock.
+
+=head3 CLOCK_THREAD_CPUTIME_ID
+
+CLOCK_THREAD_CPUTIME_ID represents the amount of execution time of the
thread associated with the clock.
+
+=head3 Finding Available POSIX Clocks
+
+On unix-like systems you can find the CLOCK_* clocks available on you
system
+using a command like:
+
+ grep -r 'define *CLOCK_' /usr/include
+
+Look for a group that includes CLOCK_REALTIME. The integer values listed
are
+the clock ids that you can use with the C<clock=N> option.
+
+A future version of NYTProf should be able to list the supported clocks.
+
+=head2 Other Clocks
+
+This section lists other clock sources that NYTProf may use.
+
+=head3 gettimeofday
+
+This is the traditional high resolution time of day interface for most
+unix-like systems. It's used on platforms like Mac OS X which don't
+(yet) support C<clock_gettime()>.
+
+With this clock NYTProf outputs timings as a count of 1 microsecond ticks.
+
+=for comment re high resolution timing for OS X:
+http://developer.apple.com/qa/qa2004/qa1398.html
+http://www.macresearch.org/tutorial_performance_and_time
+http://cocoasamurai.blogspot.com/2006/12/tip-when-you-must-be-precise-be-mach.html
+http://boredzo.org/blog/archives/2006-11-26/how-to-use-mach-clocks
+
+=head3 Time::HiRes
+
+On systems which don't support C<clock_gettime()> or C<gettimeofday()>
+NYTProf falls back to using the L<Time::HiRes> module.
+With this clock NYTProf outputs timings as a count of 1 microsecond ticks.
+
+=head2 Clock References
+
+Relevant specifications and manual pages:
+
+ http://www.opengroup.org/onlinepubs/000095399/functions/clock_getres.html
+ http://linux.die.net/man/3/clock_gettime
+
+Why 'realtime' can appear to go backwards:
+
+ http://preview.tinyurl.com/5wawnn
+
+=for comment
+http://preview.tinyurl.com/5wawnn redirects to:
+http://groups.google.com/group/comp.os.linux.development.apps/tree/browse_frm/thread/dc29071f2417f75f/ac44671fdb35f6db?rnum=1&_done=%2Fgroup%2Fcomp.os.linux.development.apps%2Fbrowse_frm%2Fthread%2Fdc29071f2417f75f%2Fc46264dba0863463%3Flnk%3Dst%26rnum%3D1%26
+
+=for comment - these links seem broken
+http://webnews.giga.net.tw/article//mailing.freebsd.performance/710
+http://sean.chittenden.org/news/2008/06/01/
+
=head1 LIMITATIONS
=head2 threads
@@ -409,7 +516,7 @@
For example, the Readonly module croaks with an "Invalid tie" when
profiled with
perl versions before 5.8.8. That's because L<Readonly> explicitly checking
for
-certain values from caller(). We're not quite sure what the cause is yet.
+certain values from caller().
=head2 Calls made via operator overloading
@@ -418,7 +525,7 @@
=head2 goto
-The C<goto &$sub;> isn't recognised as a subroutine call by the subroutine
profiler.
+The C<goto &foo;> isn't recognised as a subroutine call by the subroutine
profiler.
=head2 #line directives
@@ -427,6 +534,56 @@
=head1 CAVEATS
+=head2 SMP Systems
+
+Systems with multiple processors, which includes most modern machines, have
+
+From Linux docs (though applicable to most SMP systems):
+
+ The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are
realized on
+ many platforms using timers from the CPUs (TSC on i386, AR.ITC on
Itanium).
+ These registers may differ between CPUs and as a consequence these
clocks may
+ return bogus results if a process is migrated to another CPU.
+
+ If the CPUs in an SMP system have different clock sources then there is
no way
+ to maintain a correlation between the timer registers since each CPU
will run
+ at a slightly different frequency. If that is the case then
+ clock_getcpuclockid(0) will return ENOENT to signify this condition. The
two
+ clocks will then only be useful if it can be ensured that a process
stays on a
+ certain CPU.
+
+ The processors in an SMP system do not start all at exactly the same
time and
+ therefore the timer registers are typically running at an offset. Some
+ architectures include code that attempts to limit these offsets on
bootup.
+ However, the code cannot guarantee to accurately tune the offsets. Glibc
+ contains no provisions to deal with these offsets (unlike the Linux
Kernel).
+ Typically these offsets are small and therefore the effects may be
negligible
+ in most cases.
+
+In summary, SMP systems are likely to give 'noisy' profiles.
+Setting a L<Processor Affinity> may help.
+
+=head3 Processor Affinity
+
+Processor affinity is an aspect of task scheduling on SMP systems.
+"Processor affinity takes advantage of the fact that some remnants of a
process
+may remain in one processor's state (in particular, in its cache) from the
last
+time the process ran, and so scheduling it to run on the same processor the
+next time could result in the process running more efficiently than if it
were
+to run on another processor." (From
http://en.wikipedia.org/wiki/Processor_affinity)
+
+Setting an explicit processor affinity can avoid the problems described in
+L</SMP Systems>.
+
+Processor affinity can be set using the C<taskset> command on Linux.
+
+Future versions of NYTProf could support setting processor affinity
automatically
+(e.g. via sched_setaffinity() on Linux). Patches welcome!
+
+Note that processor affinity is inherited by child processes, so if the
process
+you're profiling spawns cpu intensive sub processes then your process will
be
+impacted by those more than it otherwise would.
+
=head2 Virtual Machines
I recommend you don't do performance profiling while running in a
@@ -460,6 +617,8 @@
Mailing list and discussion at
L<http://groups.google.com/group/develnytprof-dev>
+Blog posts L<http://blog.timbunce.org/tag/nytprof/> and
L<http://technorati.com/search/nytprof>
+
Public SVN Repository and hacking instructions at
L<http://code.google.com/p/perl-devel-nytprof/>
L<nytprofhtml> is a script included that produces html reports.
@@ -467,7 +626,10 @@
L<Devel::NYTProf::Reader> is the module that powers the report scripts.
You
might want to check this out if you plan to implement a custom report
(though
-it may be deprecated in a future release).
+it's very likely to be deprecated in a future release).
+
+L<Devel::NYTProf::ReadStream> is the module that lets you read a profile
data
+file as a stream of chunks of data.
=head1 AUTHOR
--~--~---------~--~----~------------~-------~--~----~
You've received this message because you are subscribed to
the Devel::NYTProf Development User group.
Group hosted at: http://groups.google.com/group/develnytprof-dev
Project hosted at: http://perl-devel-nytprof.googlecode.com
CPAN distribution: http://search.cpan.org/dist/Devel-NYTProf
To post, email: [email protected]
To unsubscribe, email: [EMAIL PROTECTED]
-~----------~----~----~----~------~----~------~--~---