On Wed, Oct 19, 2016 at 12:54 PM, William Hermans <[email protected]> wrote: > > > On Wed, Oct 19, 2016 at 3:24 AM, Graham <[email protected]> wrote: >> >> I have two BBG units that I use as headless servers, with only access >> through Ethernet. Both have been running without reboot for multiple months >> without any issues. I think that I mentioned that I did have a BBB do >> exactly what you describe, while running as a headless server last year, but >> at the time there was a thunderstorm in the area, and lightning strikes in >> the neighborhood. It recovered on reboot, and has never repeated the >> symptom. >> >> So, my conclusion is that it is possible to happen, but rare, and in my >> case was probably caused by electrical transient coming in the Ethernet >> connection which is routed from a cable modem to the outside world. >> >> For high reliability application, perhaps some extra transient protection >> on the Ethernet connection, and some kind of "ping monitor" that can >> auto-reboot the BBG. >> >> --- Graham > > > I haven't had a BBG Until the last 2-3 months to play with. Now, I've had > ~30 over the course of the last 2 months to observe this behavior on. Which > again has only happen once. So, I attributed what happen to me accidentally > knocking the board around a little. Until I talked with another person I > know who has experienced this issue with multiple kernels, and multiple > times over the last I don't know . . . maybe 6 months. > > So what I did was first installed the same Debian image he was using, then > changed kernels to the *bone* LTS kernel. Removed g_ether, by removing > Robert's custom boot script for the 335x evm board. After that I got the > project files from this person I know and duplicated his software setup. > Which is a mqtt application. With a custom cape. > > Anyway, I was running this software last night, and then I downloaded and > ran nload from a ssh session. But I keep getting ssh Broken pipe errors. > Which is not necessarily a concern. I've seen that before. I intend to hook > up a serial debug cable and run nload from that, but just have not gotten > around to it. > > One thing on my mind is that perhaps the software this person I know wrote > is somehow failing to deal with a "busy network" properly. Meaning if the > internet connection is bandwidth saturated, and the application is for some > reason unable to deal with a "stale connection" How will it act ? However, I > would not think this should cause the hardware to fail. Because that's what > I'm seeing when the ethernet traffic indication LEDs stop functioning, While > also rendering the ethernet connection non functional. What I was able to > observe so far however. Was that this application sends around 8-9kBit/s > data, and gets 2-3kBit/s back. > > Another concern: Knowing that mqtt by default is an inherently insecure > protocol, and this app does currently run as root . . .However there > areseveral caveats to this statement / concern. First, the application is a > peer to peer design in that only the mqtt broker can communicate with the > board. Whether it sends commands, or collects data back from the board. > Second, mqtt is able to use certificates, however I do not htink that is > currently the case with this software *YET*. I given this person I know the > standard security lecture on running root, and locking things down, etc. We > just have not acted on it yet > > With all of the above mentioned. When I ran into this issue myself, I was > not running anything other than a stock image, and the stock software that > comes with it. While the board was also just idling for 5-6 days. Maybe a > little longer. I ran uptime from an ssh session where it reported back "5 > days . . ." After which this happened. So I'm more inclined to think this is > most likely not a userspace application issue. > > I'm not even sure where to go from here, as far as tracking this issue down. > All I can really do is throw everything I know / have at the board, and hope > I get an error trapped from the live kernel log through serial.
I think it's related to suspend/cpuidle.. I know another user was having issues, where they had to ping it twice, as the first would never respond.. one thing that might help: remove the sleep pinmux's from: mac/davinci_mdio: https://github.com/RobertCNelson/dtb-rebuilder/blob/4.4-ti/src/arm/am335x-bone-common.dtsi#L370-L383 Regards, -- Robert Nelson https://rcn-ee.com/ -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups "BeagleBoard" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/CAOCHtYiMw40NSswGzXJGas3xMkjAqwL79T8%3DyOinDmcfYFg4Kw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
