I have determined that Debian was complaining about my ethernet port
because I had flow control enabled on the switch, and the switch was
getting easily overwhelmed and hanging, so the Debian resets were valid.

Thank you for the research on this.  I think you can close this case.

---------------------------------------------------------------------------

On Wed, Mar 22, 2017 at 02:42:30AM +0000, Ben Hutchings wrote:
> Control: retitle -1 TX watchdog fires on e1000e interface with flow control 
> enabled
> 
> On Tue, 2017-03-21 at 18:36 -0400, Bruce Momjian,,, wrote:
> > On Tue, Mar 21, 2017 at 04:04:11PM -0400, Bruce Momjian,,, wrote:
> > > I think this proves my problems are related to flow control.  How would
> > > you like to proceed?  Is there a patch or change you would like me to
> > > test?  Just close the ticket?
> > > 
> > > I have a fix, but it is likely others would not know they had this
> > > problem unless they were monitoring their kernel logs or their network
> > > traffic for lag.
> > 
> > Oh, I should also mention the port that is having problems is connected
> > to a NetGear GS108Ev3 switch, with current firmware, version 2.00.09. 
> > The port connected to my Actiontec FIOS router is not having problems.
> 
> I don't know about any specific bug, but if the switch sends flow
> control XOFF frames continually for long enough (usually 5 seconds)
> this will trigger the TX watchdog.
> 
> It sounds like your switch implements flow control properly (some
> broken switches auto-negotiate it but actually flood flow control
> frames).  However, if a device on some other port (that also has flow
> control enabled) sends XOFF frames continually *and* your server sends
> frames that should go to that other port, the switch will do the same
> to the server once the switch's internal queue has filled up.
> 
> If the switch has port statistics including numbers of pause frames
> then you can see where they are coming from, but I think it doesn't.
> Without that information it's going to be hard to tell exactly where
> the fault lies.
> 
> The e1000e driver *does* have statistics for pause frames transmitted
> and received (run: "ethtool -S eth0| grep flow_control").  If you log
> these every second then it should be possible to see what happens
> around the time the TX watchdog fires.  That could provide some clues
> as to whether the NIC is behaving correctly.
> 
> Ben.
> 
> -- 
> Ben Hutchings
> Power corrupts.  Absolute power is kind of neat.
>                            - John Lehman, Secretary of the US Navy
> 1981-1987



-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Reply via email to