Hi Andre,
thank you for providing a detailed report.
The mentioned patch avoids sending received information to both rx/tx halves
of a CCID. There are two possibilities:
(a) The patch itself has a bug
The patch has been tested extensively over months (but using mainly
CCID3, as CCID2 gave bad performance)
or
(b) There is a problem in the CCID2 code, since there have been many
kernel changes and the code was not so much maintained.
I would much rather hunt down this bug than revert this patch, since reverting
it
means that almost twice the work is done per each packet and truly bidirectional
connections are not currently supported.
With initial tests here I could not reproduce the bug and had difficulties
since I have not all required libraries for paraslash.
1. I see you are using loopback (127.0.0.1) where packets are routed back
internally: does the bug persist when CCID2 sender and CCID2 receiver
are on different hosts?
2. This is using CCID2, which has not been maintained for a while. Can
you please try CCID 3 also, e.g. by using the following sysctls:
sysctl -w net.dccp.default.rx_ccid=3
sysctl -w net.dccp.default.tx_ccid=3
sysctl -w net.dccp.default.tx_qlen=5
sysctl -w net.dccp.default.seq_window=100
sysctl -w net.dccp.default.send_ackvec=0
3. Caveat: `Server' listens, `Client' connects. I could not build the paraslash
app
due to missing libraries, but found that dccp_recv_open calls connect in
dccp_recv.c
while dccp_open() in dccp_send.c calls listen(). It seems that the roles are
reversed,
is it possible to swap this in the application and does the problem persist?
4. Notwithstanding (3), the BUG() condition in mod_timer should not be
triggered, so
any further information - in particular the tests in (1,2) would be good to
do.
Gerrit
| ccid2_hc_tx_send_packet: pipe=1 cwnd=1
| ------------[ cut here ]------------
| kernel BUG at kernel/timer.c:407!
| invalid opcode: 0000 [#1]
| PREEMPT
| CPU: 0
| EIP: 0060:[<c0124ace>] Not tainted VLI
| EFLAGS: 00210246 (2.6.20.1 #12)
| EIP is at mod_timer+0x28/0x2c
| eax: c15807d4 ebx: c1580414 ecx: 00000000 edx: fffbda11
| esi: c1580414 edi: dea75644 ebp: dc637db8 esp: dc637db8
| ds: 007b es: 007b ss: 0068
| Process para_server (pid: 1085, ti=dc636000 task=de5df030 task.ti=dc636000)
| Stack: dc637dc4 c0369b66 00000001 dc637df8 c0407ddc dc637de0 c15804bc
00200292
| dc637de4 c041588c c15804b0 dc637df8 00000000 dea75644 c1580414
dc7b2ec4
| dc637e1c c040971d dc637e08 dc637e8c 00000000 00000000 c0529780
c1580414
| Call Trace:
| [<c0103c3b>] show_trace_log_lvl+0x1a/0x30
| [<c0103cf2>] show_stack_log_lvl+0x8d/0xaa
| [<c0103f07>] show_registers+0x1a8/0x312
| [<c0104210>] die+0x109/0x226
| [<c01043ab>] do_trap+0x7e/0xb4
| [<c0104684>] do_invalid_op+0xa3/0xad
| [<c0415b5c>] error_code+0x74/0x7c
| [<c0369b66>] sk_reset_timer+0xf/0x19
| [<c0407ddc>] dccp_write_xmit+0x224/0x22c
| [<c040971d>] dccp_sendmsg+0x10a/0x15a
| [<c03a9e26>] inet_sendmsg+0x44/0x55
| [<c036666a>] do_sock_write+0x9a/0xa9
| [<c03666e3>] sock_aio_write+0x6a/0x7b
| [<c015afc6>] do_sync_write+0xc7/0x116
| [<c015b197>] vfs_write+0x182/0x187
| [<c015b23d>] sys_write+0x3d/0x64
| [<c0102fa4>] syscall_call+0x7/0xb
| =======================
The BUG is caused via the following chain:
1. dccp_write_xmit(sk, 0) (due to !block)
1. dccp_sendmsg
2. ccid2_hc_tx_send_packet -> with hctx->ccid2hctx_pipe >= hctx->ccid2hctx_cwnd
(see above, pipe=cwnd=1) ==> returns 1
3. in dccp_write_xmit(sk, 0):
if (!block) { /* this is true here */
sk_reset_timer(sk, &dp->dccps_xmit_timer,
msecs_to_jiffies(err)+jiffies)
==> BUG()
| <7>dccp_set_state: listen(c1580030) LISTEN -> CLOSED
This may be a clue: this socket has not gone past listen state (i.e. not
entered server)
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html