Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

Lawrence Stewart Sun, 20 Jun 2010 04:56:02 -0700

On 06/20/10 21:15, Fabian Keil wrote:

Lawrence Stewart<lstew...@freebsd.org>  wrote:

On 06/20/10 03:58, Fabian Keil wrote:

Lawrence Stewart<lstew...@freebsd.org>   wrote:

On 06/13/10 18:12, Lawrence Stewart wrote:

The time has come to solicit some external testing for my SIFTR tool.
I'm hoping to commit it within a week or so unless problems are discovered.

I'm interested in all feedback and reports of success/failure, along
with details of the architecture tested and number of CPUs if you would
be so kind.


I got the following hand-transcribed panic maybe a second after
sysctl net.inet.siftr.enabled=1

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
[...]
current process = 12 (swi4: clock)
[ thread pid 12 tid 100006 ]
Stopped at      siftr_chkpkt+0xd0:      addq    $0x1,0x8(%r14)
db>   where
Tracing pid 12 tid 100006 td 0xffffff00034037e0
siftr_chkpt() at siftr_chkpkt+0xd0
pfil_run_hooks() at pfil_run_hooks+0xb4
ip_output() at ip_output+0x382
tcp_output() tcp_output+0xa41
tcp_timer_rexmt() at tcp_timer_rexmt+0x251
softclock() at softclock+0x291
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop at ithread_loop+0x8e
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff800003ad30, rbp = 0 ---


So I've tracked down the line of code where the page fault is occurring:

          if (dir == PFIL_IN)
                  ss->n_in++;
          else
                  ss->n_out++;

ss is a DPCPU (dynamic per-cpu) variable used to keep a set of stats
per-cpu and is initialised at the start of the function like so:

          ss = DPCPU_PTR(ss);

So for ss to be NULL, that implies DPCPU_PTR() is returning NULL on your
machine. I know very little about the inner workings of the DPCPU_*
macros, but I'm pretty sure the way I use them in SIFTR is correct or at
least as intended.


siftr_chkpkt() passes ss to siftr_chkreinject() before dereferencing
it itself. I think if ss was NULL, the panic should already occur in
siftr_chkreinject().

Yes but siftr_chkreinject() only dereferences ss in the exceptional caseof a malloc failure or duplicate pkt. It's unlikely either case happensfor you and so wouldn't trigger the panic.

To be sure I added:

diff --git a/sys/netinet/siftr.c b/sys/netinet/siftr.c
index 8bc3498..b9fdfe4 100644
--- a/sys/netinet/siftr.c
+++ b/sys/netinet/siftr.c
@@ -788,6 +788,16 @@ siftr_chkpkt(void *arg, struct mbuf **m, struct ifnet 
*ifp, int dir,
         if (siftr_chkreinject(*m, dir, ss))
                 goto ret;

+       if (ss == NULL) {
+           printf("ss is NULL");
+           ss = DPCPU_PTR(ss);
+           if (ss == NULL) {
+              printf("ss is still NULL");
+              goto ret;
+           }
+        }
+
+
         if (dir == PFIL_IN)
                 ss->n_in++;
         else

which doesn't seem to affect the problem.

As in it still panics and the "ss is NULL" message is not printed? Iwould have expected to at least see "ss is NULL" printed if myhypothesis was correct... hmm.

Perhaps the way I discovered the line number at which the panic occurredwas wrong. I compiled SIFTR on my amd64 dev server with "CFLAGS+=-g" inthe SIFTR Makefile to get debug symbols, ran "objdump -Sd siftr.ko | vim-", searched for the instruction reported in the panic message i.e."addq $0x1,0x8(%r14)" and then with a bit of trial and error, recompiledSIFTR with the line of code "volatile int blah = 0; blah = 2;" atvarious points in the function and looking at the change in the objdumpoutput to pinpoint which line of C code corresponded with the "addq"instruction.

The "volatile int blah = 0; blah = 2;" compiles to "movl$0x0,0xffffffffffffffd4(%rbp)" followed immediately by "movl$0x2,0xffffffffffffffd4(%rbp)". When I put that code above the "if (dir== PFIL_IN)" statement I see the objdump output show the assembly codebefore the "addq" instruction and when I move it after the if statementthe assembly code moves after the "addq" instruction.

Perhaps you could reproduce the above procedure and see if you identifythe same point in the siftr_chkpkt function I did for the instructionreferenced by the panic message?

Could you please go ahead and retest using a GENERIC kernel and see if
you can reproduce? There could be something in your custom kernel
causing the offsets or linker set magic used by the DPCPU bits to break
which in turn is triggering this panic in SIFTR.


I'll retry without pf first, and with GENERIC afterwards.


Sounds good, thanks.

Cheers,
Lawrence
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [CFT] SIFTR - Statistical Information For TCP Research: Uncle Lawrence needs YOU!

Reply via email to