Hi Godback,

On Wed, Sep 26, 2012 at 02:57:49PM +0800, Godbach wrote:
> Hi, Willy
> 
> Actually, I have tested that send network interrupts to another three
> cores except the core haproxy run on last time, but found nothing
> different from that send to one core. And the NICs didn't drop any
> packets.

OK.

> However,  I will test it later, and observe the results of 'ethtool
> -S' and the count of network interrupts and so on.
> 
> I found some warning messages from dmesg:
> 
> [142654.793193] ------------[ cut here ]------------
> [142654.793395] WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x54/0x150()
> [142654.793972] Hardware name: System Product Name
> [142654.794573] cleanup rbuf bug: copied 68D1EF11 seq 68CFF65F rcvnxt 68D3565D
> [142654.795165] Modules linked in: ixgbe(O) binfmt_misc 8021q fcoe
> garp stp llc libfcoe libfc scsi_transport_fc scsi_tgt ip6t_REJECT nf
> _conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
> ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel
> snd_hda_codec nouveau snd_hwdep igb snd_seq snd_seq_device snd_pcm
> eeepc_wmi asus_wmi ttm drm_kms_helper sparse_keymap snd_timer snd drm
> coretemp rfkill i2c_algo_bit mxm_wmi wmi video lpc_ich mfd_core
> crc32c_intel r8169 i2c_i801 i2c_core soundcore snd_page_alloc mii mdio
> pcspkr serio_raw ghash_clmulni_intel microcode uinput [last unloaded:
> ixgbe]
> [142654.798215] Pid: 18374, comm: haproxy Tainted: G        W  O 3.5.0 #1
> [142654.798838] Call Trace:
> [142654.799440]  [<ffffffff810422cf>] warn_slowpath_common+0x7f/0xc0
> [142654.800031]  [<ffffffff810423c6>] warn_slowpath_fmt+0x46/0x50
> [142654.800653]  [<ffffffff8147c270>] ? sock_pipe_buf_release+0x20/0x20
> [142654.801237]  [<ffffffff814cf294>] tcp_cleanup_rbuf+0x54/0x150
> [142654.801847]  [<ffffffff814d0ae1>] tcp_read_sock+0x1b1/0x200
> [142654.802440]  [<ffffffff81472777>] ? sock_sendpage+0x27/0x30
> [142654.803037]  [<ffffffff814ccd60>] ? tcp_done+0x90/0x90
> [142654.803644]  [<ffffffff814d0bf0>] tcp_splice_read+0xc0/0x250
> [142654.804239]  [<ffffffff814726b2>] sock_splice_read+0x62/0x80
> [142654.804843]  [<ffffffff8118c73b>] do_splice_to+0x7b/0xa0
> [142654.805457]  [<ffffffff8118e850>] sys_splice+0x540/0x560
> [142654.806040]  [<ffffffff8159aed2>] system_call_fastpath+0x16/0x1b
> [142654.806646] ---[ end trace 46d7fb693af33fde ]---
> 
> It seems that this bug should have been resloved by commit
> 1ca7ee30630e1022dbcf1b51be20580815ffab73 before 3.5.0 released. But it
> still be appeared in kernel 3.5.0, or even be reported in kernel 3.5.3
> as the link: https://bugzilla.redhat.com/show_bug.cgi?id=854367 said.

Not good at all... You should report it to the netdev mailing list,
which is where the network developers are. TCP splicing has changed in
3.5 (making it much more efficient) but it is possible some bugs remain.

3.5 still seems a bit young. Yesterday I god repeated panics when using
a bridge on 3.5.4, I had to fall back to 3.4.11 to fix them. Not bissected
yet.

>  In my opinion, from the introduction of this bug, it is possible that
> splice will not work very well.

It's very likely. Anyway, as soon as you get such a trace in your logs,
you can't trust anything at all anymore.

I'm seeing you have the Tainted flag, maybe you're using some other
crappy drivers you don't absolutely need and which do not cope well
with recent kernels ?

Regards,
Willy


Reply via email to