This one: Hi All,
After removing C24 and C30 (next to the large unpopulated 20-pin header P2 on the bottom of the board) we ran 1000 power cycles and had a 100% success rate - i.e. board booted and phy detected every time. We used a programmable power supply and some scripts processing the uart output to count observed instances of "libphy: PHY 4a101000.mdio:00 not found" and "net eth0: phy found : id is : 0x7c0f1", and momentarily interrupted the power supply after seeing either. We ran the same test on an unmodified board and had a failure rate of 54/1000 Regards, Andrew Glen. On 27 April 2017 at 15:53, Andrew Glen <andrewtaneg...@gmail.com> wrote: > FYI: The hardware fix described earlier in this thread give 100% success, > first time, every time. > > On 27 April 2017 at 15:42, <bigj...@gmail.com> wrote: > >> If you have this problem and only care about solutions, jump to >> "workarounds" below. >> >> ### RECAP >> >> For unlucky souls who come fresh upon this problem and down want to read >> though a better part of a decade's worth of conflicting reports.... >> >> 1. Due to a design issue, the BeagleBone Black and descendants have a >> problem where they intermittently come up with various bad state set in the >> physical network connection chip (PHY) that make the wired Ethernet port >> inaccessible and there is no way to get it to recover using only software - >> a power cycle or hardware reset is required. >> >> 2. One of the ways that the PHY can have bad state is that its address >> can be assigned a different value than expected. The latest versions of the >> kernel will scan all possible addresses and find the PHY no matter what >> address is happens to get, so this failure mode is not longer part of issue >> as long as you use one of these new kernels. (BTW, I have an elegant >> solution to reassign the PHY back to the expected address which will work >> with any kernel version if you need it. It also avoids the current kluge >> that hacks up the device tree to match the new found PHY address.) >> >> 3. There are still some bad states that the PHY chip can come up in that >> are not addressed by the new kernel. As far as I know there is no software >> only workaround for these - a power cycle or hardware reset is required. >> >> 4. In my personal experience, the bad state seems to be significantly >> less likely when the board is powered though the barrel connector (or USB >> om BeagleBone Green) than when it is powered via the pin on P9 header. I've >> also noticed that most people in this thread are powering thier boards via >> a cape or header connected power supply which makes sense since these >> people tend to seen the problem more often. Note that the non-recoverable >> bad state can still happen even on a baord powered via the barrel - it is >> just less likely. >> >> 5. In my personal experience, the bad state seems to be more likely on >> certain individual boards than others. I have a board that comes up in the >> bad state about 50% of the time, while other boards only come up int he bad >> state 1 in 100 times. >> >> 6. In my experience, the bad state seems to be significantly less like if >> *nothing* is connected to the Ethernet port at power up. I really mean not >> connected - even if there is an unpowered device connected to the other end >> of the network cable, then the bad state occurs more often. The cable much >> be unplugged at one end or the other. >> >> 7. Bit 13 in register 18 seems to be a 100% indication that you are in >> the bad state. I have never seen a board with that bit set recover, and I >> have never seen a non-recoverable board without that bit set (except for a >> couple of seconds if you manually clear it before it sets itself on again). >> This bit is "reserved" in the datasheet and so far no hints from Microchip >> as to what it might mean that might lead to a better understanding of the >> issue. >> >> 8. In the bad state, it is possible to get the PHY to link by manually >> configuring it to 10Mbs half duplex (no auto negotiation). While the link >> light comes on and the "link active" bit is set, it does not appear to be >> decoding incoming packets so this is not a useful workaround. >> >> ### WORKAROUNDS >> >> In order of effectiveness/desirability. >> >> 1. Use a different board. All the commercially available BeagleBone Black >> and descendants share this design issue, so look at maybe the Raspberry Pi >> or one of the other ARM based SBCs. >> >> 2. Spin your own version of the board. This problem could be completely >> resolved by adding a connection between the reset pin of the PHY and a gpio >> on the ARM. This way the ARM would be carefully control the required timing >> sequence for bringing up the PHY chip - and also be able to hardware reset >> the chip in case there are any problems. >> >> 3. Use a USB Ethernet adapter rather than the on-board eth0 port. >> Compatible adapters can be found for less than $10. >> >> 4. Connect a gpio pin to the reset pin on header P9. That reset pin is >> tied to the hardware reset pin of the PHY chip, so you can reset it under >> software control. gpio 60 happens to be very close physically, making for a >> very easy jumper connection. Then you need a script to test for the bad >> state, and activate the gpio to reset if it is found. Note that the reset >> pin will also reset the ARM, the the BB will reboot every-time you do this >> but should eventually come up (and satay up) with the PHY in the good >> state. >> >> 5. Unplug the the Ethernet port during power up, check for bad state >> after the board comes up, and keep power cycling it until it comes up in a >> good state, then reconnect the network cable. >> >> 6. Power the board though the barrel or USB rather than though the >> headers. >> >> Though a combination of 5 & 6, I was able to get my bank of boards to >> come up with a better than 80% good state rate on the first try. Yona >> Applegate (of LEDscape fame) reports being able to get his large collection >> of BBS to all come up with good networking 100% of the time using #4, >> although the amount of time it takes for all boards to get to the good >> state is indeterminate. >> >> ### FUTURE DIRECTIONS >> >> There are likely other workaround possible if someone wants to invest >> more time working on this issue. >> >> Here is a tool that let's you easily inspect and modify registers in the >> PHY.... >> https://github.com/bigjosh/phyreg >> >> Here are all my notes from debugging this issue... >> https://www.evernote.com/pub/bigjosh2/bbbphyproblem >> >> I am happy to try and help anyone who want to dig in deeper. I personally >> would love to not have to unplug/replug 72 ethernet cables every time I >> have to power cycle my bank of BBBs! >> >> -josh >> >> >> >> >> >> >> >> >> >> On Tuesday, November 26, 2013 at 5:22:42 PM UTC-5, AndrewTaneGlen wrote: >>> >>> Hello, >>> >>> I have noticed very rare cases (~1/50) of the ethernet phy on the >>> Beaglebone Black not being detected on boot, and requiring a hard reset (as >>> opposed to calling 'reset' from the command line) to get it to work/be >>> detected again. >>> >>> This problem has been mentioned in a couple of other threads (below) >>> concerning different topics (i.e. problems getting the BBB to boot, and the >>> ethernet phy 'dying' some time after initially working fine), with no >>> solution/workaround for this specific problem being suggested - so I >>> thought I'd start a thread specifically for it. >>> https://groups.google.com/forum/#!msg/beagleboard/Vp4pxwHm8B >>> U/Iaw3p5xm0MoJ >>> https://groups.google.com/forum/#!topic/beagleboard/aXv6An1xfqI >>> >>> In the first thread mlc/Mike discussed his response to the problem as >>> follows: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> *"I had issues with the network not coming up on boot, and it was >>> traced down to problems with the SYS_RESETn line. I had a level translator >>> connected to SYS_RESETn, to drive a 5V chip. It was powered by a 5V rail. >>> If the 5V rail powered up "differently" than the 3.3V rail (not sure of the >>> exact relationship), I guess it pulled the SYS_RESETn line to weird levels >>> that affected the network chip but not the main processor. I'm now using a >>> GPIO to drive the external 5V chip now, instead of the SYS_RESETn >>> line. Anyway, the moral is be very, very careful with SYS_RESETn, because >>> it can cause hard-to-trace problems with networking.*" >>> >>> I see that the A6 Revision of the Beaglebone Black has some changes to >>> the SYS_RESETn line: >>> >>> "*Based on notification from TI, in random instances there could be a >>> glitch in the SYS_RESETn signal from the processor where the SYS_RESETn >>> signal was taken high for a momentary amount of time before it was supposed >>> to. To prevent this, the signal was ORed with the PORZn (Power On reset).* >>> " (http://elinux.org/Beagleboard:BeagleBoneBlack#Revi >>> sion_A6_.28Production_Version.29) >>> >>> Is it likely that this modification will improve/resolve the issue I am >>> seeing with the ethernt phy not resetting/powering-up correctly?, seeing as >>> the SYS_RESETn signal also feeds into the nRST pin on the ethernet phy (The >>> SYS_RESETn line is left untouched in my application). >>> >>> >>> Some additional observations from dmesg concerning this use: >>> >>> On a good phy boot I see the following: >>> [ 2.810749] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6 >>> [ 2.817206] davinci_mdio 4a101000.mdio: detected phy mask fffffffe >>> [ 2.833517] libphy: 4a101000.mdio: probed >>> [ 2.837871] davinci_mdio 4a101000.mdio: phy[0]: device >>> 4a101000.mdio:00, driver unknown >>> >>> Followed later by: >>> [ 21.286920] net eth0: initializing cpsw version 1.12 (0) >>> [ 21.301166] net eth0: phy found : id is : 0x7c0f1 >>> >>> On a 'bad phy' boot I see the following (differences highlighted): >>> [ 2.806763] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6 >>> [ 2.813213] davinci_mdio 4a101000.mdio: detected phy mask *fffffffb* >>> [ 2.829512] libphy: 4a101000.mdio: probed >>> [ 2.833875] davinci_mdio 4a101000.mdio: phy[2]: device >>> 4a101000.mdio:02, driver unknown >>> >>> Followed later by: >>> [ 21.346861] net eth0: initializing cpsw version 1.12 (0) >>> [ 21.354379] *libphy: PHY 4a101000.mdio:00 not found* >>> [ 21.359469] *net eth0: phy 4a101000.mdio:00 not found on slave 0* >>> >>> >>> So it looks like the 'davinci_mdio_reset' function see the phy in both >>> instances, but reports differently on the bad boot. I am not sure what to >>> make of this. >>> >>> I am using the Debian 7.2 Rootfs and the 'RobertCNelson' kernel >>> '3.12.0-bone8'. >>> >>> >>> >>> Regards, >>> Andrew. >>> >>> >>> > -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups "BeagleBoard" group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/CAHKgOt4S99M3scTJr2sYr%3DY2xXra-c1z4m2%2BrF79j81QkANXew%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.