Re: [networking-discuss] Re: [dtrace-discuss] Re: DTrace Network Provider

Brendan Gregg - Sun Microsystems Wed, 27 Sep 2006 12:11:48 -0700

G'Day Darren,

On Mon, Sep 25, 2006 at 04:00:02PM -0700, [EMAIL PROTECTED] wrote:
> Brendan Gregg - Sun Microsystems wrote:
[...]
> >I've already said that I want to measure code path latencies too, and that
> >I think it should be part of a provider. However, since kernel network
> >engineers understand the kernel network code, then they can get their own
> >statistics from fbt and some sdt, and write their own private provider
> >for their own pet needs. Are they also "customers" of DTrace? Sure, but
> >customers who have the skills to satisfy their own private needs.
> >
> 
> I think you're underestimating the complexities involved (or maybe
> I'm underestimating what dtrace can do?)


It's complex, sure, but it will make a big difference where that complexity
is. For the kernel to do all the work and dump associated events for
code path latency, that is probably going to be a complex enhancement.
The enabled probe effect may become a unwanted factor in the timing (we'd
have to test this), and as it would be coded into the kernel, it would be
set in stone - whether that is appropriate for one script or another.

However, if the kernel were to dump some rather raw details, we could
try to move much of the complexity to a DTrace translater (such as
/usr/lib/dtrace/io.d), which can be tweaked as needed; and/or do 
postprocessing of the details in a userland program - even Perl - to
reduce the inline enabled probe effect. Each script could approach the
raw data in a different way as appropriate.

> What I'd like to see dtrace be able to do is follow a packet through
> the kernel, not a thread of execution.  fbt is for following execution.
> sdt is up to how it is used, but most often, it is just random hooks
> in the code.
> 
> The difference between using fbt and being able to follow a packet
> is when a packet is put on a queue, the fbt path is terminated,
> even though the packet still has places to go and I/O ports to see.
> 
> If the networking dtrace provider doesn't deliver the capability
> to trace a packet through the kernel then it is not delivering a
> key feature that is needed by many people.

Noone is saying that the provider will never do this. If you are saying
that this feature is important to you, then sure - I understand.
And if many people means many engineers (which in turn helps them
achieve cool things), then I understand that too.

> In addition to the examples you've presented on the web page,
> I think it would be of benefit if the following questions could
> also be answered:
> 
> - how many PPS is an application responsible for being sent out?
> - how many PPS is an application responsible for on the receive side?
> The above two questions respeated for bytes per second, not PPS...or
> take all of your "questions answered" and repeat for PID.
>
> ...and I think this is the biggest problem with the dtrace networking
> provider as scoped so far - nothing in it relates to a process.  I'm
> aware that this can be challenging for networking (especially on the
> receive side) but there are worthwhile questions here to answer from
> the generic customer perspective.
> 

I completely agree, and this was the first networking issue I wanted
DTrace to solve - which led me to write tcpsnoop, tcptop, etc, 
which appear in the DTraceToolkit.

It's hard - I haven't thought how best to add it to the provider - 
but I do want it to be there. My tcpsnoop/tcptop scripts are 
helpful in the meantime, but they break between Solaris versions
(as they are entirerly fbt based). One day I hope to write stable
versions of tcpsnoop/tcptop.

> I'd dispute that "Are hackers/crackers port scanning my server? (TCP
> flag matching by IP address)" is actually a question worth asking.
> If it is directly connected to the Internet, the answer is "yes"
> (but maybe not right now.)  If there are any other boxes (routers,
> firewalls) along the way in from the Internet, then it's not a
> question you should be using dtrace to find an answer for.  If
> you're trying to come up with a way to justify that particular
> dtrace probe, I'd recommend looking for a better example question.

Ok, - I do think a sensible final solution is that customers run a
NIDS (such as snort) somewhere on the network, or use tools to examine
their firewall logs.

To have a single DTrace script check for all types of scans and give
a report will still have some value. A customer finds unexplained
network utilization, and can run a quick script to check what it is -
rather than install a NIDS if one isn't available.

I'm sure there are better example questions with example DTrace solutions
that I can add to this site - I hope some people will start posting 
suggestions. :)

> Some other probes that might be useful:
> - RTT calculations from TCP timestamp measurements
> - TCP window size changes

These could be scripted from tcp::send and tcp::receive alone; but
seperate probes may make sense.

> - when a packet is dropped because it is "bad" (checksum, etc)

Yep - we need a tcp::drop-checksum or some such.

> - look through "netstat -s" output as there is quite a large
>  number of stats there that are worthy of being identified
>  with a probe.

Sure. So long as we really do want to probe it, and that the kstat
value isn't sufficient. Thinking of sample scripts that use them should
help here.

> p.s the HREF for dtrace-discuss at the bottom of
> http://www.opensolaris.org/os/community/dtrace/NetworkProvider/
> is wrong/broken.

thanks,

Brendan

-- 
Brendan
[CA, USA]
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] Re: [dtrace-discuss] Re: DTrace Network Provider

Reply via email to