Hi all,

At work I am in the process of deploying an array of 4 cubietrucks for
use in the Xen Project automated test framework.

2 of the 4 boards seem to work just fine in (repeated) pre-commissioning
tests but two are failing fairly reliably.

One with:
        Timeout, server not responding.
>From ssh and the other with networking issues (DNS timeouts etc) during
initial installation (Debian installer).

These boards are now in a colo, but previously when they were on my desk
the same two boards both failed with the ssh Timeout error and the other
two were ok, so the problem boards do seem to persist over changes of
infrastructure etc.

ssh is used to login to the boxes and drive the test case from a
controller machine. In this case the failing test is a build job which
is building Xen or a Linux kernel etc. All build jobs run natively under
Debian (running things under Xen would follow, but it never gets past
the build jobs due to this issue).

The kernel in use is 3.16.7-ckt2 (Debian revision 1~bpo70+1).

The boards are all using u-boot v2014.10.

I can't see anything in the logs and the ifconfig stats show now errors.
Apart from the hiccough networking seems fine (i.e. subsequent ssh
commands do work).

Ssh is using "-o BatchMode=yes -o ConnectTimeout=100 -o
ServerAliveInterval=100" options and make is invoked with -j4 which
doesn't seem too aggressive.

In the case of the failing
installation /sys/class/net/eth0/statistics/*{dropped,errors} are all 0,
nothing in dmesg or the logs. TBH this one might be a cabling or
infrastructure issue, but I'm reasonably confident that the ssh one is
not.

Since two of the boards are OK and two are not I suppose something
somewhere must be marginal.

I'm not really sure where to start looking. Perhaps CONFIG_GMAC_TX_DELAY
on the u-boot side might be relevant? I've had a look through the logs
from v3.16 to master for drivers/net/ethernet/stmicro/stmmac and nothing
leaps out as being a relevant backport.

Or perhaps it isn't networking related at all, e.g. perhaps AHCI is
stalling and stopping sshd from responding to pings. There's nothing in
the logs to indicate one way or another.

Any bright ideas on where to look / what to try would be gratefully
received.

Ian.


-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to