Comments/questions inline for Dave and Arnaldo: (or anybody else!)

>
> ip_queue_xmit:
>      push       %ebp
>      push       %edi
>      push       %esi
>      push       %ebx
>      sub        $0xbc, %esp
>      mov        0xd0(%esp), %ebp        ! %ebp = arg0 (skb)
>      mov        0x8(%ebp), %ebx         ! %ebx = skb->sk
>      mov        0x13c(%ebx), %eax       ! %eax = inet_sk(sk)->opt

How did you get the assembler code - curious so that I can do in the future...

> So find out what path in the DCCP stack allows an output packet
> SKB to not have skb->sk initialized. :-)
>
> This funny buisness with only doing a skb_set_owner_w(skb, sk)
> in dccp_transmit_skb() if the SKB is cloned is probably part
> of the problem.
>
> For example, the first OOPS goes back into dccp_retransmit_skb().  If
> the skb is already cloned, it makes a copy using pskb_copy() and

Can you only clone an skb once or as many times as you like? I had a
quick look at the code for skb_clone and it looks like you can
repeatedly clone. If that is the case Arnaldo why did you choose to
use a pskb_copy?

> passes that to dccp_transmit_skb().  That won't pass the
> "skb_cloned()" test, and will likely leave us with a NULL skb->sk.
>
> There needs to therefore be a better way to test "not DATA packet" in
> dccp_transmit_skb(), because "skb_cloned()" obviously does not always
> indicate that.

I tried altering dccp_retransmit so that it just clones unconditionally i.e.:
        return dccp_transmit_skb(skb_clone(skb, GFP_ATOMIC));

and also altered dccp_transmit_skb so that it sets owner unconditionally like:
-               if (skb_cloned(skb))
                        skb_set_owner_w(skb, sk);

I tried one of these at a time.

In both cases ttcp now times out which is good.

What's not good however is that performance was also really bad (10 kb
per sec) and also get crashes like this after the timeout.

ttcp-t: buflen=256, nbuf=100, align=16384/+0, port=5001  dccp(inet) 
-> 10.0.2.3ttcp-t: socket
Unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
00000000
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: dccp_ccid3 dccp_tfrc_lib dccp e100 3c59x
CPU:    0
EIP:    0060:[<00000000>]    Not tainted VLI
EFLAGS: 00010286   (2.6.14-rc3)
EIP is at 0x0
eax: c0440000   ebx: 00000000   ecx: 00000000   edx: 00000100
esi: 00000000   edi: 00000000   ebp: 00000000   esp: c0441f00
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c0440000 task=c03bfba0)
Stack: 00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000
       00000000 bfc49bd0 00000000 00000000 00000000 00000000 ffffffff ffffffff
       ffffffff 00000000 00000006 0000000f c6f64000 c6f64000 00000000 00000000
Call Trace:
Code:  Bad EIP value.

Now if I do both of those changes at the same time I get my
performance back but I get the following crash so there must be more
than one place that sk is not getting set in the skb...

Unable to handle kernel NULL pointer dereference at virtual address 0000013c
 printing eip:
c031ad24
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: dccp_ccid3 dccp_tfrc_lib dccp e100 3c59x
CPU:    0
EIP:    0060:[<c031ad24>]    Not tainted VLI
EFLAGS: 00010292   (2.6.14-rc2)
EIP is at ip_queue_xmit+0x14/0x4c0
eax: c8887ba0   ebx: 00000000   ecx: 00000001   edx: 00000000
esi: c7353aac   edi: c7353ac0   ebp: c70899e0   esp: c043de08
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c043c000 task=c03bcba0)
Stack: c0357620 c0374d7d 00000004 c037647d c037641b 00000000 00000000 00000000
       00000000 c01d32d3 0000005b c0357620 c0374d7d c7b66560 00000000 00000246
       c7dc44c8 00000048 c0357620 c0374d7d c75fb620 00000000 c043deb8 00000009
Call Trace:
 [<c01d32d3>] acpi_ev_sci_xrupt_handler+0x3f/0x46
 [<c0134140>] handle_IRQ_event+0x30/0x70
 [<c887e0ef>] dccp_v4_checksum+0x3f/0xb0 [dccp]
 [<c887d9fe>] dccp_v4_send_check+0x2e/0x40 [dccp]
 [<c88809d8>] dccp_transmit_skb+0x2f8/0x380 [dccp]
 [<c8882e3a>] dccp_retransmit_timer+0x4a/0x190 [dccp]
 [<c011ed85>] update_process_times+0x85/0x130
 [<c8882f80>] dccp_write_timer+0x0/0xa0 [dccp]
 [<c8882fe4>] dccp_write_timer+0x64/0xa0 [dccp]
 [<c0106f22>] timer_interrupt+0x42/0x60
 [<c011ef06>] run_timer_softirq+0xb6/0x1b0

Anyway I'll keep on hunting but don't know how far I'll get. What I
really need to do is understand the retransmit path better...

Hopefully this might others some clues.

Ian
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to