ERRATA :
clock_gettime :
Average : 0.000000272516616 sec, 272.51662 nsec
Standard Deviation : 0.000000008640484 sec , 8.64048 nsec

Sorry! Bad copy paste... The variation is quite big actually.

On 10-07-07 11:28 AM, David Goulet wrote:
On 10-07-06 03:39 PM, Nils Carlson wrote:
Cool, so the measurements came through...


I've retested UST per event time with the new commit made few days ago
fixing the custom probes and cache line alignment. Here are the results
for TSC counter and clock_gettime (test made 1000 times on i7) :

rdtsc :
Average : 0.000000242229708 sec, 242.22971 nsec
Standard Deviation : 0.000000001663147 sec , 1.66315 nsec

clock_gettime :
Average : 0.000000272516616 sec, 272.51662 nsec
Standard Deviation : 0.000000002340784 sec , 2.34078 nsec

What I would like to see is the automatic detection of whether the rdtsc
instruction is usable,
a test for this already exists in the kernel and the question is whether
this info is currently exported
or whether we need to submit a patch to export it.


 From userspace, to test, this would be a syscall via prctl right? The
thing is that it's needed at compile time. Right now, the __i386__ and
__x86_64__ define is tested. Upon gcc compilation, it would be great to
have something like TSC_AVAILABLE define and then compile the right
function (either clock_gettime or rdtsc).

However, there is some issues about consistency by using TSC for example
between CPUs counter... so I think we need to be very careful about that
even if the performance are 30ns less and much more _stable_ (see std
variation).

David

Then we should probably start looking at a simple choosing mechanism,
probably a function pointer?

/Nils
On Jul 6, 2010, at 8:12 PM, David Goulet wrote:

Hey,

After some talks with Nils from Ericsson, there was some questions
about using the TSC counter and not clock_gettime in include/ust/clock.h

I ran some test after the meeting and was quite surprised by the
overhead of clock_gettime.

On an average run ...
WITH clock_gettime : ~ 266ns per events
WITH rdtsc instruction : ~ 235ns per events

And it is systematic... I'm getting stable result with rdtsc with
standard deviation of ~2ns.

As little as I know on TSC, one thing for sure, with SMP, it becomes
much more "fragile" to rely on it because we don't have assurance of
coherent counters between CPUs and also the CPU scaling policy
(ondemand is default on Ubuntu now). New CPUs support constant_tsc and
nonstop_tsc flags but still a small range of them.

Right now, UST is forcing the use of clock_gettime even if i386 or
x86_64 is used.
Should a change be consider ?

Thanks
David

_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev


_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev

_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev

Reply via email to