Dear Click guys, We are facing a problem with a node running click that connects two machines. This problem happens only with using TCP. The hardware complains about a hang, persistently. Check the kernel log appended at the end of this message.
Before explaining the cause of the problem, I will give a brief summary of our click setup. We setup click as L-2 packet forwarder. It gets a frame from one interface and either discard it or send it out through another interface after holding for some time determined by LAN settings .Recent drivers and Linux kernel tries to combine packets with similar header parameters into one socket buffer before shipping them upstairs. This feature is called Generic-Receive-Offload GRO. Click gets packet from the kernel, from the driver actually, and send it to the wire after some massaging. This causes the hardware to hang because it doesn’t know how to segment. A solution is to prepare the buffer by calling “gro_skb_segment”. --- a/todevice.cc +++ b/todevice.cc @@ -480,6 +480,8 @@ ToDevice::queue_packet(Packet *p, struct netdev_queue *txq) skb_put(skb1, need_tail); } + if(skb_is_gso(skb1) ) skb_gso_segment(skb1,dev->features); + // set the device annotation; // apparently some devices in Linux 2.6 require it skb1->dev = dev; In our case we disable GRO before setting up Click... This is the kernel error log. {{{ 273.820359] e1000e 0000:03:00.1: eth2: Detected Hardware Unit Hang: [ 273.820360] TDH <c0> [ 273.820361] TDT <d8> [ 273.820362] next_to_use <d8> [ 273.820363] next_to_clean <c0>[ 273.820364] buffer_info[next_to_clean]: [ 273.820364] time_stamp <ffffde2d> [ 273.820365] next_to_watch <c0> [ 273.820366] jiffies <ffffe66f> [ 273.820367] next_to_watch.status <0> [ 273.820368] MAC Status <80387> [ 273.820369] PHY Status <792d> [ 273.820369] PHY 1000BASE-T Status <3800> [ 273.820370] PHY Extended Status <3000> [ 273.820371] PCI Status <10> [ 277.820398] e1000e 0000:03:00.1: eth2: Detected Hardware Unit Hang: [ 277.820400] TDH <c0> [ 277.820401] TDT <d8> [ 277.820401] next_to_use <d8> [ 277.820402] next_to_clean <c0> [ 277.820403] buffer_info[next_to_clean]:[ 277.820404] time_stamp <ffffde2d> [ 277.820404] next_to_watch <c0> [ 277.820405] jiffies <ffffea57> [ 277.820406] next_to_watch.status <0> [ 277.820407] MAC Status <80387> [ 277.820408] PHY Status <792d> [ 277.820408] PHY 1000BASE-T Status <3800> [ 277.820409] PHY Extended Status <3000> [ 277.820410] PCI Status <10> [ 277.824018] ------------[ cut here ]------------ [ 277.824030] WARNING: at /build/buildd/linux-3.2.0/net/sched/sch_generic.c:255 dev_watchdog+0x25a/0x270() [ 277.824033] Hardware name: PowerEdge 860 [ 277.824035] NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out [ 277.824036] Modules linked in: click(O) proclikefs(O) nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc radeon ttm drm_kms_helper drm dcdbas i3000_edac edac_core i2c_algo_bit shpchp psmouse serio_raw mac_hid lp parport e1000 e1000e[ 277.824062] Pid: 0, comm: swapper/0 Tainted: G O 3.2.0-27-generic #43-Ubuntu [ 277.824065] Call Trace: [ 277.824067] <IRQ> [<ffffffff8106729f>] warn_slowpath_common+0x7f/0xc0 [ 277.824078] [<ffffffff81067396>] warn_slowpath_fmt+0x46/0x50 [ 277.824088] [<ffffffff81024282>] ? x86_pmu_enable+0x1f2/0x270 [ 277.824091] [<ffffffff8155f2ca>] dev_watchdog+0x25a/0x270 [ 277.824096] [<ffffffff81110cc0>] ? perf_rotate_context+0x110/0x220 [ 277.824099] [<ffffffff8155f070>] ? qdisc_reset+0x50/0x50 [ 277.824101] [<ffffffff8155f070>] ? qdisc_reset+0x50/0x50 [ 277.824106] [<ffffffff810761a6>] call_timer_fn+0x46/0x160 [ 277.824109] [<ffffffff8155f070>] ? qdisc_reset+0x50/0x50 [ 277.824113] [<ffffffff81077af2>] run_timer_softirq+0x132/0x2a0 [ 277.824119] [<ffffffff81095225>] ? ktime_get+0x65/0xe0 [ 277.824125] [<ffffffff8106ea48>] __do_softirq+0xa8/0x210 [ 277.824128] [<ffffffff8101a779>] ? read_tsc+0x9/0x20 [ 277.824132] [<ffffffff8109c1b4>] ? tick_program_event+0x24/0x30 [ 277.824137] [<ffffffff816644ec>] call_softirq+0x1c/0x30 [ 277.824141] [<ffffffff81015305>] do_softirq+0x65/0xa0 [ 277.824144] [<ffffffff8106ee2e>] irq_exit+0x8e/0xb0 [ 277.824147] [<ffffffff81664e8e>] smp_apic_timer_interrupt+0x6e/0x99 [ 277.824150] [<ffffffff81662d5e>] apic_timer_interrupt+0x6e/0x80 [ 277.824152] <EOI> [<ffffffff8107894d>] ? get_next_timer_interrupt+0x8d/0x120 [ 277.824157] [<ffffffff8101be45>] ? mwait_idle+0x95/0x210 [ 277.824160] [<ffffffff81012236>] cpu_idle+0xd6/0x120 [ 277.824164] [<ffffffff816205fe>] rest_init+0x72/0x74 [ 277.824171] [<ffffffff81cfbc03>] start_kernel+0x3b0/0x3bd [ 277.824174] [<ffffffff81cfb388>] x86_64_start_reservations+0x132/0x136 [ 277.824177] [<ffffffff81cfb140>] ? early_idt_handlers+0x140/0x140 [ 277.824180] [<ffffffff81cfb459>] x86_64_start_kernel+0xcd/0xdc [ 277.824182] ---[ end trace 9f25206935d2c245 ]--- }}} Regards _______________________________________________ click mailing list click@amsterdam.lcs.mit.edu https://amsterdam.lcs.mit.edu/mailman/listinfo/click