Hello, this thread has a patch that solved the bug (for me). https://www.mail-archive.com/netdev@vger.kernel.org/msg189347.html
The patch is here: https://www.mail-archive.com/netdev@vger.kernel.org/msg189923/0001-tg3-Add-clock-override-support-for-5762.patch I tested this patch on the following kernels and situations. 1) Stable kernels 4.13.3 and 4.15 crash without the patch (plus all other versions tested). Patch is not merged yet in the main linux branch, until (and including) 4.15 (stable). 2) Stable kernels 4.13.3 and 4.15 work great with the patch: no timeouts on tg3. Fast transfers on gigabit links and 10/100 links. 3) I wrote to the patch author, mentioned my results and asked when it will be merged on Jan 31 (10 days ago). Still waiting, probably the author is currently quite busy. 4) A lot of tests performed during weeks. The last session took about one or two weeks, working full time, on an isolated network. Using the fog open source cloning solution. Several hundreds of GB transferred during tests, for cloning 100+ machines inside a few labs. Both single and multicast cloning sessions used. Tested with a gigabit switch and also with 10/100 switches. Checked both single and multicast, sequential tests, in parallel, with/without power failures, with/without several patches, in many configurations, with lots of kernel parameters, you name it. 5) The test scenario shows this bug is completely reproducible, 100% of the time. Without the patch, my kernels always fail. Tested about 20 different versions and none worked. With the patch above, the two versions always work correctly. 6) A minor detail: patch has a slight offset for 4.15 (2 lines, probably new comments or code) but works anyway. This work would be impossible without all the cooperation from the fog team. Sebastian suggested the patch, and others helped a lot. A big "thank you" for them! I wonder when this will be merged in the main kernel. Please, can anyone help on this? Regards, Paulo -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1447664 Title: 14e4:1687 broadcom tg3 network driver disconnects under high load Status in linux package in Ubuntu: Triaged Status in linux package in Debian: New Bug description: The tg3 broadcom network driver that binds with chipset 5762 goes offline and unable to recover (even with tg3 watchdog timeout) when network transmit is under high load. Call trace: https://launchpadlibrarian.net/204185480/dmesg When this happens, only a reboot would be able to fix it. Sometimes, however, bringing the interface offline and online (via ifconfig) would recover networking. I've also tested with the latest tg3 driver (dec 2014 version) and networking is still problematic. I have also disabled TSO, GSO etc... with ethtool and the bug still surfaces. This bug may be related to the integrated Firmware. Here is the procedure to replicate the issue because it is hard to replicate it under moderate network load. 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using a Ubuntu/Kubunu Live CD 14.04-15.04. 2. from another machine: start 5 sessions, repetitively copy (scp with public key authentication) a 70 meg file back and forth to the tg3 machine in each session. (not sure if this is necessary) 3. create a 1GB file on the tg3 machine, with something like dd if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000)) 4. from another machine: repetitively scp copy that 1GB file from the tg3 machine. This can be done with something like: while [ 0 ]; do scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp done; Networking will mostly goes offline in about 10-30 minutes. WORKAROUND: Add udev rule to make the changes permanent in /etc/udev/rules.d/80-tg3-fix.rules : ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off" ProblemType: Bug DistroRelease: Ubuntu 15.04 Package: linux-image-3.19.0-15-generic 3.19.0-15.15 ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3 Uname: Linux 3.19.0-15-generic x86_64 ApportVersion: 2.17.2-0ubuntu1 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC1: kubuntu 3748 F.... pulseaudio /dev/snd/controlC0: kubuntu 3748 F.... pulseaudio CasperVersion: 1.360 Date: Thu Apr 23 11:16:24 2015 IwConfig: eth0 no wireless extensions. lo no wireless extensions. LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422) MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT ProcEnviron: LANGUAGE= TERM=xterm PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity quiet splash --- PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-3.19.0-15-generic N/A linux-backports-modules-3.19.0-15-generic N/A linux-firmware 1.143 RfKill: SourcePackage: linux UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev' UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 10/22/2014 dmi.bios.vendor: Hewlett-Packard dmi.bios.version: L06 v02.15 dmi.board.asset.tag: 2UA5041TG4 dmi.board.name: 2215 dmi.board.vendor: Hewlett-Packard dmi.chassis.asset.tag: 2UA5041TG4 dmi.chassis.type: 6 dmi.chassis.vendor: Hewlett-Packard dmi.modalias: dmi:bvnHewlett-Packard:bvrL06v02.15:bd10/22/2014:svnHewlett-Packard:pnHPEliteDesk705G1MT:pvr:rvnHewlett-Packard:rn2215:rvr:cvnHewlett-Packard:ct6:cvr: dmi.product.name: HP EliteDesk 705 G1 MT dmi.sys.vendor: Hewlett-Packard To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp