Pi and console cable??? Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373 On Jan 21, 2016 8:05 PM, "Chuck McCown" <[email protected]> wrote:
> I am guessing that if it caused by a fight between two CPUs/FPGAs for > common memory, any dumps would be different each time. You would actually > have to put a hardware logic analyzer on the pins of the chip to catch it. > > *From:* Josh Luthman <[email protected]> > *Sent:* Thursday, January 21, 2016 5:58 PM > *To:* [email protected] > *Subject:* Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium With > Love- Replace the bad ePMP units. > > > Would it be helpful to have a test or memory dump load for the APs it's > happening consistently on? Rather than reproducing it in the lab, just use > real repeating units. > > Josh Luthman > Office: 937-552-2340 > Direct: 937-552-2343 > 1100 Wayne St > Suite 1337 > Troy, OH 45373 > On Jan 21, 2016 7:50 PM, "Aaron Schneider" < > [email protected]> wrote: > >> Hi Everyone – >> >> >> >> Sorry for the delay in response on this thread. I’d like to give an >> update of where we are with this issue. >> >> >> >> First off, I would like to apologize for the issues that this is >> causing. We have heard reports for awhile in varying fashion, and Tushar >> had been talking about having things like this for quite some time, but we >> were having issues finding some correlation between reports (configuration, >> network topology, etc), as well as being unable to recreate the issue in >> our lab on demand. This issue appears to have definitely got worse in the >> 13.4 release and is becoming more widespread as the weather turns. >> >> >> >> What we have found out in the last several weeks is that there is an >> issue with the memory controller code in the FPGA. What this leads to is >> memory coherency being lost which actually has now been verified to lead to >> several issues. We had seen reports of various resets over time but had no >> reason to correlate them to one root cause until now. The most prevalent >> of these is the Watchdog Reset without any accompanying crash log. The >> other issues with the same root cause are the Illegal Instruction crash, >> the Invalid NiBuf crash, as well as any Null Exception Handler crash. The >> bottom line is, when memory contents glitch on your software, it depends on >> when it happens as to what the outcome is. We have found this to be very >> reproducible at very cold temperatures (-20C -- -50C), but it has been seen >> and reported at higher temperatures, just not as often. >> >> >> >> The nature of the FPGA based memory controller is that there can be >> timing issues that get exacerbated at extreme temperatures. If you don’t >> have proper constraints in place for a given signal path, its timing >> characteristics can change on you as temperature changes. Also, if you >> don’t have a proper constraint in place, even recompiling the FPGA can >> change the characteristics that then make what used to work fine >> susceptible to extremes. Something happened with the 13.4 FPGA that >> brought this to the edge such that it is now a problem and as we are seeing >> with winter cold coming in, becoming much more prevalent at cold >> temperatures. 13.4 and 13.4.1 have the same FPGA. 14.1.2 has a new FPGA >> and there have been some improvements made in this area, but we have found >> it is still susceptible to the problem. >> >> >> >> We are reproducing the problem in our lab and we have multiple developers >> digging in to figure out what is going on. These types of issues with >> timing are generally very difficult to find and fix, but this is our >> highest priority right now and we will not have another release until this >> is fixed. >> >> >> >> I’ve talked mostly about 13.4 and 13.4.1 here, but the nature of this >> issue and how it can interact with hardware doesn’t preclude it from having >> been the cause of the issues some (like Tushar) have seen over time. Once >> we have a fix for this, we will be adding more rigorous regression testing >> including an internal HW memory test to validate that this type of memory >> issue doesn’t come back again. >> >> >> >> From what we’ve seen and heard, this issue only affects the 450 AP FPGA >> and is not an issue on the 450 SM, 430AP/SM, nor the 450i devices. The >> 450i is a very different architecture and has a hardware based memory >> controller and watchdog timer whereas on the 450/430 based devices, these >> items are in the FPGA. >> >> >> >> >> >> Again, I apologize for the severe inconvenience and realize that it is >> getting colder and colder in NA so we are racing against the clock with >> this. As soon as we have any updates and new open beta loads with a fix, >> I’ll let you know. >> >> >> >> I appreciate your patience. >> >> >> >> Regards, >> >> -Aaron >> >> >> >> *From:* Af [mailto:[email protected]] *On Behalf Of *Brian Sullivan >> *Sent:* Thursday, January 21, 2016 4:11 PM >> *To:* [email protected] >> *Subject:* Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium >> With Love- Replace the bad ePMP units. >> >> >> >> I was assured today that the issue isn't the hardware.� Evidently this >> issue can be solved with an upcoming software upgrade. >> Time will tell. >> >> http://community.cambiumnetworks.com/t5/PMP-450/13-2-to-13-4-System-Reset-Exception-Watchdog-Reset/td-p/43347/page/2 >> >> On 1/21/2016 4:02 PM, Joe Falaschi wrote: >> >> We have some APs that have uptime over 60 days but many reboot every 1-3 >> weeks. �This is definitely an outlier. �We've been in contact with >> Cambium on this via an open ticket and sending them all of the information >> they request and nobody has said oh gosh that is bad hardware RMA it. >> �So, we're just going around and around. �We'll end up just replacing >> it and hoping they will take it back because obviously this is bad. �We >> are running 14.x per their request. �We saw this on 13.x as well. >> >> >> >> Joe >> >> >> >> >> >> On Jan 21, 2016, at 12:05 PM, Ken Hohhof wrote: >> >> >> >> Joe, that is seriously bad.� I see watchdog resets and a few stack >> dumps, but uptime on 450 APs is typically 2-4 weeks, despite the recent >> cold weather, in fact I don�t think it has been more common than it was >> last summer.� I have not gone to 14.x though, everything is still on 13.2. >> >> � >> >> So either you have a bad unit, or 14.x is making it much worse.� If >> everyone was seeing resets every few minutes or hours, I think there would >> be villagers with torches and pitchforks outside Cambium HQ. >> >> � >> >> Brian from FVI does have a thread on the Cambium Community about this. >> >> � >> >> FWIW, I have one 450i 900 MHz which necessarily is on 14.1, and it does >> not appear to be having watchdog resets.� Lightly loaded however, just 2 >> subs. >> >> � >> >> � >> >> *From:* Joe Falaschi <[email protected]> >> >> *Sent:* Thursday, January 21, 2016 11:34 AM >> >> *To:* [email protected] >> >> *Subject:* Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium >> With Love- Replace the bad ePMP units. >> >> � >> >> We see a ton of reboots on the 450 platform as well.� It's getting >> pretty frustrating simply because this is such a long term issue.� One of >> my APs has rebooted 195 times (now running 14.1.2).� They are saying we >> should replace the AP but it is unclear if we can RMA it or not.� We do >> have an open ticket. >> >> � >> >> Joe Falaschi >> >> e-vergent >> >> � >> >> � >> >> � >> >> <Screen Shot 2016-01-21 at 11.30.16 AM.png> >> >> On Jan 20, 2016, at 9:26 PM, Mark Radabaugh wrote: >> >> >> >> Hum�� sounds very similar.�� It�s temperature sensitive as well >> - gets far worse with low temperatures, and we are having pretty cold temps >> this week.�� >> >> � >> >> Extremely frustrating and causing real customer complaints. >> >> � >> >> Mark >> >> � >> >> On Jan 20, 2016, at 9:28 PM, Tushar Patel <[email protected]> wrote: >> >> � >> >> Over two years we have been seeing random reboot. We were told over and >> over again you are the only one.� Then few people started reporting. >> >> � >> >> But cambium never could get bottom of the problems for two years so, I >> gave up on cambium fixing this random reboot.� We stop calling them about >> it. >> >> � >> >> As the new versions of the software has come out over two years we have >> see the frequency of the problem reduce but not gone away. >> >> Tushar >> >> � >> >> >> On Jan 20, 2016, at 6:25 PM, Mark Radabaugh <[email protected]> wrote: >> >> Tushar, >> >> � >> >> What did you give up on?�� Or do? >> >> � >> >> Please note the mailing and shipping address change below: >> >> � >> >> Mark Radabaugh >> Amplex >> 22690 Pemberville Rd >> >> Luckey, OH 43443 >> 419-837-5015 x1021 >> [email protected] >> >> � >> >> On Jan 20, 2016, at 4:49 PM, Tushar Patel <[email protected]> wrote: >> >> � >> >> That's what they used to tell us too.� We have given up on the subject >> now. >> >> Tushar >> >> � >> >> >> On Jan 20, 2016, at 1:09 PM, Mark Radabaugh <[email protected]> wrote: >> >> Wait - they keep telling us we are the only ones that this happens to >> with 450? >> >> � >> >> So who else is having reboot-o-rama with 450�s? >> >> � >> >> Mark >> >> � >> >> On Jan 20, 2016, at 1:20 PM, Brian Sullivan <[email protected]> >> wrote: >> >> � >> >> I wish they would fix/replace the bad 450 AP's that suffer from Watchdog >> Resets.� >> Although replacing 100 450 AP's is cheaper than ePMP.� :-/ >> >> On 1/20/2016 12:11 PM, Josh Luthman wrote: >> >> Why would making the memory faster degrade performance? >> >> � >> >> � >> >> Josh Luthman >> Office: 937-552-2340 >> Direct: 937-552-2343 >> 1100 Wayne St >> Suite 1337 >> Troy, OH 45373 >> >> � >> >> On Wed, Jan 20, 2016 at 1:00 PM, Tyson Burris @ Internet Communications >> Inc <[email protected]> wrote: >> >> Hello Cambium, >> >> >> � >> >> At the MidWest-IX launch party last night, several of us Indiana WISPs >> compared notes on the �cold weather� problems we are seeing with >> ePMPs.� It was very interesting to learn we are experience identical >> problems across the spectrum. >> >> We all understand this is a DRAM issue with certain units you have >> identified.� We also understand the firmware RC that has been made >> available to fix this short term. >> >> The bottom line is we are very frustrated and grow tired of dealing with >> it.� >> >> >> � >> >> Our concern is simple.� If your software fix �degrades� the >> performance of the product or triggers other issues, as it has been >> suggested, we would prefer a full recall and replacement program >> immediately. >> >> >> � >> >> If the suggestion that the fix will degrade the product performance is >> inaccurate and not cause other issues, I would like for this to be made >> public.� >> >> >> � >> >> Thank you, >> >> >> � >> >> *Tyson Burris, President* >> *Internet Communications Inc.* >> *739 Commerce Dr.* >> *Franklin, IN 46131* >> *�* >> *317-738-0320 <317-738-0320> Daytime #* >> *317-412-1540 <317-412-1540> Cell/Direct #* >> *Online: **www.surfici.net <http://www.surfici.net>* >> >> >> � >> >> <Mail Attachment.png> >> >> *What can ICI do for you?* >> >> >> *Broadband Wireless - PtP/PtMP Solutions - WiMax - Mesh Wifi/Hotzones - >> IP Security - Fiber - Tower - Infrastructure.* >> *�* >> *CONFIDENTIALITY NOTICE: This e-mail is intended for the* >> *addressee shown. It contains information that is* >> *confidential and protected from disclosure. Any review,* >> *dissemination or use of this transmission or its contents by* >> *unauthorized organizations or individuals is strictly* >> *prohibited.* >> >> � >> >> � >> >> >> � >> >> � >> >> >> >> � >> >> � >> >> � >> >> � >> >> >> >> >> >
