Re: [beagleboard] Re: Boot failure "external abort on non-linefetch" in cpsw_probe with any image after Wi-Fi install

Loren Amelang Tue, 21 Jan 2014 15:02:05 -0800

My board is back from RMA, with what looks like a new ethernet chip. It is 
ever so slightly raised up along one edge, though the pin alignment and 
soldering job are perfect. I guess it could have been that way before, but 
usually that's a sign of manual replacement. The only info I was able to 
get from the RMA Team was: 
---
After running the diagnostic tests, we found that there was a Ethernet 
malfunction. We have fixed the issue and everything is properly working. 
---

The board was carefully solvent cleaned after the repair; a little glob of 
glue or rosin I had noticed before is now gone. But I noticed lots of tiny 
solder splashes on the bottom of the board, mostly along the expansion 
connector pins. A couple of them could have been a real problem if the 
board coating hadn't protected the traces. All popped off easily with a 
fingernail or blunt plastic tool. 

So far, the board boots fine and works as expected. 

The differences between booting and panic:

< cpsw, usb_ether
---
> Phy not found  <-- with bad ethernet, just before reading uEnv.txt
> PHY reset timed out
> cpsw, usb_ether

< [ time ] pinctrl-single 44e10800.pinmux: could not request pin 21 on 
device pinctrl-single
< systemd-fsck[85]: Angstrom: clean, 49509/112672 files, 354728/449820 
blocks
< [ time ] libphy: PHY 4a101000.mdio:01 not found  <-- with good ethernet!
< [ time ] net eth0: phy 4a101000.mdio:01 not found on slave 1  <-- last 
line before logo
< 
< .---O---.                                           
< |       |                  .-.           o o        
< |   |   |-----.-----.-----.| |   .----..-----.-----.
< |       |     | __  |  ---'| '--.|  .-'|     |     |
< |   |   |  |  |     |---  ||  --'|  |  |  '  | | | |
< '---'---'--'--'--.  |-----''----''--'  '-----'-'-'-'
<                 -'  |
<                 '---'
---
> [ time ] pinctrl-single 44e10800.pinmux: could not request pin 21 on 
device pinctrl-single
> [ time ] Unhandled fault: external abort on non-linefetch (0x1008) at 
0xe09fe000

So in both conditions it complains about "phy not found"! With a bad chip, 
it complains near the beginning of U-Boot. With working ethernet, it 
complains at the very end of kernel boot. It seems like someone who knows 
the details of cpsw_probe needs to figure out how to make it report a 
failed ethernet chip gracefully. And why libphy still reports an error when 
the ethernet is good and boot is successful. 

I'm finally able to login and view files. I'm wondering if these are 
standard, or are they leftover from the RMA testing:
---
root@beaglebone:/# cat /media/BEAGLEBONE/uEnv.txt
optargs=quiet drm.debug=7
root@beaglebone:/# cat /media/BEAGLEBONE/uEnv.txtboot
optargs=run_hardware_tests quiet
---

After receiving the board back, I couldn't use VNC or SSH, though I could 
ping the ethernet ports. In both cases Wireshark showed my external request 
followed by an immediate RST from the BBB. I tried re-installing the 
previous VNC package, but it said "Package x11vnc (0.9.13-r0.8) installed 
in root is up to date. Still, the trick to make it load itself didn't seem 
to work. I found 
http://feeds.angstrom-distribution.org/feeds/v2012.12/ipk/eglibc/all/angstrom-x11vnc-xinit_1.0-r2.0_all.ipk.
and that installed and worked immediately after a restart. The "netstat 
-lntu" command did not see it until after it was active, even though it did 
seem to see all the other open ports immediately after booting. 

SSH was trickier. I finally found
https://groups.google.com/forum/#!topic/beagleboard/Ya2qE4repSY
-----
"ssh_exchange_identification: Connection closed by remote host"
>From looking at the script above (/etc/init.d/dropbear) it seems like the 
identity file in /etc/dropbear/dropbear_rsa_host_key might be causing the 
problem and the script recreates them if they don't exist.  So I removed it 
and started dropbear (/etc/init.d/dropbear start) again and it generated 
new keys and then I could ssh in.  It now works!  (The side effect of doing 
this is you also have to remove a line in the client's ~/.shh/know_hosts 
because the identity of the beaglebone has changed.)
-----
My /etc/dropbear/dropbear_rsa_host_key file was zero-length, so I removed 
it. The "dropbear start" command didn't work for me, a BBB restart was 
required after I manually deleted the key file. I also unchecked the 
"History" box in TeraTerm - and it saved a new RSA fingerprint. Now works 
with default password choice and blank password field, and also works with 
Tunnelier. 

Other random things I just learned...  

At least on Windows, when the USB cable is connected, there is a "Gadget 
Serial" device USBSER000 from "Linux Developer Community" available as a 
COM port (ttyGS0 in the BBB), alongside the "USB Serial Port" VCP0 from 
FTDI which is my debug console adapter COM port (ttyO0 in the BBB). The 
"gadget" port is only active after boot is complete, so I didn't have much 
opportunity to see it before! But it claimed a lower COM port number, so I 
assume it installed along with the ethernet gadget when I first connected 
via USB. 

That leaves the question, could I have somehow fried my ethernet chip? I 
checked my incoming cable and it is fully DC isolated. The connector on the 
BBB is fully DC isolated. It is not a POE-capable connector, there is no 
diode array that could feed power into the grounded pin 8. So if I did 
something to cause my failure, it was not through the ethernet cable. 

On Tuesday, January 7, 2014 2:57:59 PM UTC-8, Vaibhav wrote:
>
>
> Hi Gerald,
>
> In case it's not a logistics nightmare (which it could very well be), 
> could you let the list know if it's really a HW thing? Based on the 
> multiple things that Loren has tried out, my hunch is it might be. However, 
> there's always the possibility of this being due to some subtle s/w bug 
> that someone would need to chase down and make things more reliable.
>  
>

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [beagleboard] Re: Boot failure "external abort on non-linefetch" in cpsw_probe with any image after Wi-Fi install

Reply via email to