Re: [E1000-devel] OOM in secondary cgroup leading to networking loss

Alexander Duyck Tue, 31 Jul 2018 07:49:56 -0700

On Mon, Jul 30, 2018 at 4:43 PM, Àbéjídé Àyodélé
<abejideayod...@gmail.com> wrote:
>> Is it always the OOM errors followed by the Tx timeout?
>
> Yes, I believe I have the dmesg from one of the earlier incidence, I can
> clean that up and make it public if you want.


There shouldn't be any need. Basically what you want to check for is
to make sure those logs have the same pattern with OOM errors followed
by the rcu_sched warning about detecting a CPU stall. If that is the
case that is the most likely root cause for the Tx hangs that are
being reported.

>>  Is it an actual serial connection or is it something like serial over
>> LAN?
>
> Serial over LAN
>
>> Do you know if you have any sort of flow control enabled or
>> anything that might delay displaying the message?
>
> None that I know of
>
>> Also what volume of logs are you sending over the serial interface?
>
> Just kernel logs (dmesg).
>
> Thanks
>
> Abejide Ayodele
> It always seems impossible until it's done. --Nelson Mandela

Well at this point I am not sure there is much left we can really do
on our end. Basically what we need to do is make it so that the
logging to the serial port doesn't trigger the RCU/CPU stall. You
might try testing with the serial over lan logging disabled and maybe
take a look at trying something like netconsole or the like to see if
that might resolve the issue. Otherwise you might take a look at
seeing if you can resolve the OOM condition so that you aren't sending
enough logs to the serial console to trigger the stall.

Thanks.

- Alex

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] OOM in secondary cgroup leading to networking loss

Reply via email to