Nicolas Williams wrote:
On Wed, Sep 27, 2006 at 12:11:38PM -0700, Brendan Gregg - Sun Microsystems 
wrote:
On Mon, Sep 25, 2006 at 04:00:02PM -0700, [EMAIL PROTECTED] wrote:
The difference between using fbt and being able to follow a packet
is when a packet is put on a queue, the fbt path is terminated,
even though the packet still has places to go and I/O ports to see.

If the networking dtrace provider doesn't deliver the capability
to trace a packet through the kernel then it is not delivering a
key feature that is needed by many people.
Noone is saying that the provider will never do this. If you are saying
that this feature is important to you, then sure - I understand.
And if many people means many engineers (which in turn helps them
achieve cool things), then I understand that too.

The fbt provider allows you to probe function calls/returns.  It still
requires that the scripter thread together the actual function calls
into a trace of call flow -- as far as the kernel side of DTrace is
concerned the fbt probe firings are discrete events.
[...]
If it takes associative arrays to do it then it will be expensive.

Generalizing somewhat away from networking, we would like tracing of
data objects, independent of the execution environment.

I started wondering if dtrace could make use of the "data breakpoint"
facility that many processors offer these days, labeling a data
object for triggering probes just by setting such a trap-on-access.
Two problems: First, the precision may be too tight, trapping exactly
one address, when you want a whole mblk's worth (the reverse, trapping
a cacheline or page is ok, just filter in the fired-probe code).
Second, processor breakpoint registers are a scarce resource.  We
would not be able to handle an indefinitely complex trace operation,
over several mblks at once.  The situation is obviously worse when
Fred across the hall is tracking disk I/O blocks on the system at
the same time.

On the other hand, it's usable in user-space and kernel-space including
interrupt level, which is a good start.

To attack the first problem above, perhaps we should ask cpu designers
for data-breakpoint *range* [1]  facilities?

So what resource is less limited than cpu breakpoint registers but can
induce entry to specified code (a dtrace probe)?  Bingo: main memory
and ECC codes [2].   We can deliberately mark objects, to a precision
of an ECC code word, as bad - but list them in a tracking table which
the ECC handler checks for dtrace entry in preference to a real ECC
fault.

The non-enabled probe cost is zero.  The enabled probe cost is that
of a) rewriting the memory area in question with deliberate bad ECC;
a bit expensive to do (e.g.) to every packet on a 10GB link,
(especially if a bad-ECC write is significantly slower than a normal
write) plus b) another rewrite with good ECC, plus c) the actual
probe code.

There is a reliability downside: the tracked memory may no
longer be checked for real faults.
It helps if an ECC codepoint is available which is not used
by any hardware fault (or combination of faults).

Very large objects should be marked with bad-page-translations
in MMU tables rather than bad-ECC to reduce the costs [3].

Probes would have to be dynamically placed on memory areas; in the
dtrace world I suspect this would be best done as a result of the
firing of some traditional, code-based, probe - thus allowing
restriction of the cost of the enabled data probes by appropriate
prefiltering conditions.

- Jeremy Harris


Footnotes:
[1], [2], [3]  These may be patentable ideas.  Since I no longer
          work for Sun, they are hereby placed in the public
          domain, with no restrictions.
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to