This one:

Hi All,

After removing C24 and C30 (next to the large unpopulated 20-pin header P2
on the bottom of the board) we ran 1000 power cycles and had a 100%
success rate - i.e. board booted and phy detected every time.

We used a programmable power supply and some scripts processing the uart
output to count observed
instances of "libphy: PHY 4a101000.mdio:00 not found" and "net eth0:
phy found : id is : 0x7c0f1", and momentarily interrupted the power supply
after seeing either.

We ran the same test on an unmodified board and had a failure rate of
54/1000


Regards,
Andrew Glen.

On 27 April 2017 at 15:53, Andrew Glen <andrewtaneg...@gmail.com> wrote:

> FYI: The hardware fix described earlier in this thread give 100% success,
> first time, every time.
>
> On 27 April 2017 at 15:42, <bigj...@gmail.com> wrote:
>
>> If you have this problem and only care about solutions, jump to
>> "workarounds" below.
>>
>> ### RECAP
>>
>> For unlucky souls who come fresh upon this problem and down want to read
>> though a better part of a decade's worth of conflicting reports....
>>
>> 1. Due to a design issue, the BeagleBone Black and descendants have a
>> problem where they intermittently come up with various bad state set in the
>> physical network connection chip (PHY) that make the wired Ethernet port
>> inaccessible and there is no way to get it to recover using only software -
>> a power cycle or hardware reset is required.
>>
>> 2. One of the ways that the PHY can have bad state is that its address
>> can be assigned a different value than expected. The latest versions of the
>> kernel will scan all possible addresses and find the PHY no matter what
>> address is happens to get, so this failure mode is not longer part of issue
>> as long as you use one of these new kernels. (BTW, I have an elegant
>> solution to reassign the PHY back to the expected address which will work
>> with any kernel version if you need it. It also avoids the current kluge
>> that hacks up the device tree to match the new found PHY address.)
>>
>> 3. There are still some bad states that the PHY chip can come up in that
>> are not addressed by the new kernel. As far as I know there is no software
>> only workaround for these - a power cycle or hardware reset is required.
>>
>> 4. In my personal experience, the bad state seems to be significantly
>> less likely when the board is powered though the barrel connector (or USB
>> om BeagleBone Green) than when it is powered via the pin on P9 header. I've
>> also noticed that most people in this thread are powering thier boards via
>> a cape or header connected power supply which makes sense since these
>> people tend to seen the problem more often. Note that the non-recoverable
>> bad state can still happen even on a baord powered via the barrel - it is
>> just less likely.
>>
>> 5. In my personal experience, the bad state seems to be more likely on
>> certain individual boards than others. I have a board that comes up in the
>> bad state about 50% of the time, while other boards only come up int he bad
>> state 1 in 100 times.
>>
>> 6. In my experience, the bad state seems to be significantly less like if
>> *nothing* is connected to the Ethernet port at power up. I really mean not
>> connected - even if there is an unpowered device connected to the other end
>> of the network cable, then the bad state occurs more often. The cable much
>> be unplugged at one end or the other.
>>
>> 7. Bit 13 in register 18 seems to be a 100% indication that you are in
>> the bad state. I have never seen a board with that bit set recover, and I
>> have never seen a non-recoverable board without that bit set (except for a
>> couple of seconds if you manually clear it before it sets itself on again).
>> This bit is "reserved" in the datasheet and so far no hints from Microchip
>> as to what it might mean that might lead to a better understanding of the
>> issue.
>>
>> 8. In the bad state, it is possible to get the PHY to link by manually
>> configuring it to 10Mbs half duplex (no auto negotiation). While the link
>> light comes on and the "link active" bit is set, it does not appear to be
>> decoding incoming packets so this is not a useful workaround.
>>
>> ### WORKAROUNDS
>>
>> In order of effectiveness/desirability.
>>
>> 1. Use a different board. All the commercially available BeagleBone Black
>> and descendants share this design issue, so look at maybe the Raspberry Pi
>> or one of the other ARM based SBCs.
>>
>> 2. Spin your own version of the board. This problem could be completely
>> resolved by adding a connection between the reset pin of the PHY and a gpio
>> on the ARM. This way the ARM would be carefully control the required timing
>> sequence for bringing up the PHY chip - and also be able to hardware reset
>> the chip in case there are any problems.
>>
>> 3. Use a USB Ethernet adapter rather than the on-board eth0 port.
>> Compatible adapters can be found for less than $10.
>>
>> 4. Connect a gpio pin to the reset pin on header P9. That reset pin is
>> tied to the hardware reset pin of the PHY chip, so you can reset it under
>> software control. gpio 60 happens to be very close physically, making for a
>> very easy jumper connection. Then you need a script to test for the bad
>> state, and activate the gpio to reset if it is found. Note that the reset
>> pin will also reset the ARM, the the BB will reboot every-time you do this
>> but should eventually come up (and satay up) with the PHY in the good
>> state.
>>
>> 5. Unplug the the Ethernet port during power up, check for bad state
>> after the board comes up, and keep power cycling it until it comes up in a
>> good state, then reconnect the network cable.
>>
>> 6. Power the board though the barrel or USB rather than though the
>> headers.
>>
>> Though a combination of 5 & 6, I was able to get my bank of boards to
>> come up with a better than 80% good state rate on the first try. Yona
>> Applegate (of LEDscape fame) reports being able to get his large collection
>> of BBS to all come up with good networking 100% of the time using #4,
>> although the amount of time it takes for all boards to get to the good
>> state is indeterminate.
>>
>> ### FUTURE DIRECTIONS
>>
>> There are likely other workaround possible if someone wants to invest
>> more time working on this issue.
>>
>> Here is a tool that let's you easily inspect and modify registers in the
>> PHY....
>> https://github.com/bigjosh/phyreg
>>
>> Here are all my notes from debugging this issue...
>> https://www.evernote.com/pub/bigjosh2/bbbphyproblem
>>
>> I am happy to try and help anyone who want to dig in deeper. I personally
>> would love to not have to unplug/replug 72 ethernet cables every time I
>> have to power cycle my bank of BBBs!
>>
>> -josh
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tuesday, November 26, 2013 at 5:22:42 PM UTC-5, AndrewTaneGlen wrote:
>>>
>>> Hello,
>>>
>>> I have noticed very rare cases (~1/50) of the ethernet phy on the
>>> Beaglebone Black not being detected on boot, and requiring a hard reset (as
>>> opposed to calling 'reset' from the command line) to get it to work/be
>>> detected again.
>>>
>>> This problem has been mentioned in a couple of other threads (below)
>>> concerning different topics (i.e. problems getting the BBB to boot, and the
>>> ethernet phy 'dying' some time after initially working fine), with no
>>> solution/workaround for this specific problem being suggested - so I
>>> thought I'd start a thread specifically for it.
>>> https://groups.google.com/forum/#!msg/beagleboard/Vp4pxwHm8B
>>> U/Iaw3p5xm0MoJ
>>> https://groups.google.com/forum/#!topic/beagleboard/aXv6An1xfqI
>>>
>>> In the first thread mlc/Mike discussed his response to the problem as
>>> follows:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *"I had issues with the network not coming up on boot, and it was
>>> traced down to problems with the SYS_RESETn line. I had a level translator
>>> connected to SYS_RESETn, to drive a 5V chip. It was powered by a 5V rail.
>>> If the 5V rail powered up "differently" than the 3.3V rail (not sure of the
>>> exact relationship), I guess it pulled the SYS_RESETn line to weird levels
>>> that affected the network chip but not the main processor. I'm now using a
>>> GPIO to drive the external 5V chip now, instead of the SYS_RESETn
>>> line. Anyway, the moral is be very, very careful with SYS_RESETn, because
>>> it can cause hard-to-trace problems with networking.*"
>>>
>>> I see that the A6 Revision of the Beaglebone Black has some changes to
>>> the SYS_RESETn line:
>>>
>>> "*Based on notification from TI, in random instances there could be a
>>> glitch in the SYS_RESETn signal from the processor where the SYS_RESETn
>>> signal was taken high for a momentary amount of time before it was supposed
>>> to. To prevent this, the signal was ORed with the PORZn (Power On reset).*
>>> " (http://elinux.org/Beagleboard:BeagleBoneBlack#Revi
>>> sion_A6_.28Production_Version.29)
>>>
>>> Is it likely that this modification will improve/resolve the issue I am
>>> seeing with the ethernt phy not resetting/powering-up correctly?, seeing as
>>> the SYS_RESETn signal also feeds into the nRST pin on the ethernet phy (The
>>> SYS_RESETn line is left untouched in my application).
>>>
>>>
>>> Some additional observations from dmesg concerning this use:
>>>
>>> On a good phy boot I see the following:
>>> [    2.810749] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
>>> [    2.817206] davinci_mdio 4a101000.mdio: detected phy mask fffffffe
>>> [    2.833517] libphy: 4a101000.mdio: probed
>>> [    2.837871] davinci_mdio 4a101000.mdio: phy[0]: device
>>> 4a101000.mdio:00, driver unknown
>>>
>>> Followed later by:
>>> [   21.286920] net eth0: initializing cpsw version 1.12 (0)
>>> [   21.301166] net eth0: phy found : id is : 0x7c0f1
>>>
>>> On a 'bad phy' boot I see the following (differences highlighted):
>>> [    2.806763] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
>>> [    2.813213] davinci_mdio 4a101000.mdio: detected phy mask *fffffffb*
>>> [    2.829512] libphy: 4a101000.mdio: probed
>>> [    2.833875] davinci_mdio 4a101000.mdio: phy[2]: device
>>> 4a101000.mdio:02, driver unknown
>>>
>>> Followed later by:
>>> [   21.346861] net eth0: initializing cpsw version 1.12 (0)
>>> [   21.354379] *libphy: PHY 4a101000.mdio:00 not found*
>>> [   21.359469] *net eth0: phy 4a101000.mdio:00 not found on slave 0*
>>>
>>>
>>> So it looks like the 'davinci_mdio_reset' function see the phy in both
>>> instances, but reports differently on the bad boot. I am not sure what to
>>> make of this.
>>>
>>> I am using the Debian 7.2 Rootfs and the 'RobertCNelson' kernel
>>> '3.12.0-bone8'.
>>>
>>>
>>>
>>> Regards,
>>> Andrew.
>>>
>>>
>>>
>

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beagleboard+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beagleboard/CAHKgOt4S99M3scTJr2sYr%3DY2xXra-c1z4m2%2BrF79j81QkANXew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to