So where I have seen this it seems to be a timing problem.  It’s a 4 sector 
site, 3.65 450 with a SyncInjector for timing.   The ‘south’ AP will slowly 
degrade with SNR getting worse over about 24 hours, frame utilization rising to 
100%, and throughput dropping.   I don’t usually let it run long enough this 
way to trigger LBT’s but I have seen them on occasion.   Reboot usually fixes 
it for a while.

We swapped AP’s a while back with no change, and also swapped SyncInjectors.  
The ‘south’ AP does pick up GPS internally so I currently have it running with 
sync from the internal GPS + freerun.  The AP seems stable this way other than 
the internal GPS drops out on occasion, goes into freerun, and then ‘jumps’ 
timing when it comes back a few minutes later causing reregistrations and/or 
LBT events. 

Next up in trying to fix this is to take a CTM out and swap that in to see if 
it makes any difference.  We have plenty of 3650 sites running with 
SyncInjectors so I’m a bit confused, but so far this really feels like a timing 
issue.

Mark

> On Jan 21, 2016, at 9:56 PM, George Skorup <[email protected]> wrote:
> 
> When in doubt, try to kill it with fire... or ice.
> 
> I wonder if this is related. I've had a couple 3.6 clusters start randomly 
> dropping sessions the past week or so while it's been cold. Most SMs can't 
> re-register. SMs with HP definitely cannot re-register and say the HP VC was 
> stuck and cleared a few times in their logs. APs are rebooted and all is 
> clear. Mostly at night. I figured it was some traffic overload condition, 
> until it happened at 4am where the traffic is at minimum.
> 
> Then similar things happened back during the summer, except they kept failing 
> to register due to "out of range" in the reg fail list. Again, have to reboot 
> the APs to fix it.
> 
> And in both cases, a LBT hit seems to trigger this. However, I've seen the 
> same thing happen on 5.7 sectors where there is obviously no LBT.
> 
> Or how about sync, no sync, sync, no sync, sync, no sync until the AP is 
> rebooted.
> 
> Can bad stuff in memory do all kinds of weird shit just like this? I hope 
> this is the root of all this, because I'm out of things to try.. and sanity.
> 
> On 1/21/2016 6:50 PM, Aaron Schneider wrote:
>> Hi Everyone �
>> �
>> Sorry for the delay in response on this thread.� I�d like to give an 
>> update of where we are with this issue.
>> �
>> First off, I would like to apologize for� the issues that this is 
>> causing.� We have heard reports for awhile in varying fashion, and Tushar 
>> had been talking about having things like this for quite some time, but we 
>> were having issues finding some correlation between reports (configuration, 
>> network topology, etc), as well as being unable to recreate the issue in our 
>> lab on demand. �This issue appears to have definitely got worse in the 
>> 13.4 release and is becoming more widespread as the weather turns.�� �
>> �
>> What we have found out in the last several weeks is that there is an issue 
>> with the memory controller code in the FPGA.� What this leads to is memory 
>> coherency being lost which actually has now been verified to lead to several 
>> issues.� We had seen reports of various resets over time but had no reason 
>> to correlate them to one root cause until now.�� The most prevalent of 
>> these is the Watchdog Reset without any accompanying crash log.� The other 
>> issues with the same root cause are the Illegal Instruction crash, the 
>> Invalid NiBuf crash, as well as any Null Exception Handler crash.� The 
>> bottom line is, when memory contents glitch on your software, it depends on 
>> when it happens as to what the outcome is.� We have found this to be very 
>> reproducible at very cold temperatures (-20C -- -50C), but it has been seen 
>> and reported at higher temperatures, just not as often.
>> �
>> The nature of the FPGA based memory controller is that there can be timing 
>> issues that get exacerbated at extreme temperatures.� If you don�t have 
>> proper constraints in place for a given signal path, its timing 
>> characteristics can change on you as temperature changes.� Also, if you 
>> don�t have a proper constraint in place, even recompiling the FPGA can 
>> change the characteristics that then make what used to work fine susceptible 
>> to extremes.�� Something happened with the 13.4 FPGA that brought this 
>> to the edge such that it is now a problem and as we are seeing with winter 
>> cold coming in, becoming much more prevalent at cold temperatures.� 13.4 
>> and 13.4.1 have the same FPGA.� 14.1.2 has a new FPGA and there have been 
>> some improvements made in this area, but we have found it is still 
>> susceptible to the problem.
>> �
>> We are reproducing the problem in our lab and we have multiple developers 
>> digging in to figure out what is going on.� These types of issues with 
>> timing are generally very difficult to find and fix, but this is our highest 
>> priority right now and we will not have another release until this is 
>> fixed.�
>> �
>> I�ve talked mostly about 13.4 and 13.4.1 here, but the nature of this 
>> issue and how it can interact with hardware doesn�t preclude it from 
>> having been the cause of the issues some (like Tushar) have seen over 
>> time.� Once we have a fix for this, we will be adding more rigorous 
>> regression testing including an internal HW memory test to validate that 
>> this type of memory issue doesn�t come back again.
>> �
>> From what we�ve seen and heard, this issue only affects the 450 AP FPGA 
>> and is not an issue on the 450 SM, 430AP/SM, nor the 450i devices.� The 
>> 450i is a very different architecture and has a hardware based memory 
>> controller and watchdog timer whereas on the 450/430 based devices, these 
>> items are in the FPGA.
>> �
>> �
>> Again, I apologize for the severe inconvenience and realize that it is 
>> getting colder and colder in NA so we are racing against the clock with 
>> this.�� As soon as we have any updates and new open beta loads with a 
>> fix, I�ll let you know.
>> �
>> I appreciate your patience.
>> �
>> Regards,
>> -Aaron
>> �
>> From: Af [mailto:[email protected] <mailto:[email protected]>] On 
>> Behalf Of Brian Sullivan
>> Sent: Thursday, January 21, 2016 4:11 PM
>> To: [email protected] <mailto:[email protected]>
>> Subject: Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium With 
>> Love- Replace the bad ePMP units.
>> �
>> I was assured today that the issue isn't the hardware.� Evidently this 
>> issue can be solved with an upcoming software upgrade.
>> Time will tell.
>> http://community.cambiumnetworks.com/t5/PMP-450/13-2-to-13-4-System-Reset-Exception-Watchdog-Reset/td-p/43347/page/2
>>  
>> <http://community.cambiumnetworks.com/t5/PMP-450/13-2-to-13-4-System-Reset-Exception-Watchdog-Reset/td-p/43347/page/2>
>> On 1/21/2016 4:02 PM, Joe Falaschi wrote:
>> We have some APs that have uptime over 60 days but many reboot every 1-3 
>> weeks. �This is definitely an outlier. �We've been in contact with 
>> Cambium on this via an open ticket and sending them all of the information 
>> they request and nobody has said oh gosh that is bad hardware RMA it. �So, 
>> we're just going around and around. �We'll end up just replacing it and 
>> hoping they will take it back because obviously this is bad. �We are 
>> running 14.x per their request. �We saw this on 13.x as well. 
>> �
>> Joe
>> �
>> �
>> On Jan 21, 2016, at 12:05 PM, Ken Hohhof wrote:
>> 
>> 
>> Joe, that is seriously bad.� I see watchdog resets and a few stack dumps, 
>> but uptime on 450 APs is typically 2-4 weeks, despite the recent cold 
>> weather, in fact I don�t think it has been more common than it was last 
>> summer.� I have not gone to 14.x though, everything is still on 13.2.
>> �
>> So either you have a bad unit, or 14.x is making it much worse.� If 
>> everyone was seeing resets every few minutes or hours, I think there would 
>> be villagers with torches and pitchforks outside Cambium HQ.
>> �
>> Brian from FVI does have a thread on the Cambium Community about this.
>> �
>> FWIW, I have one 450i 900 MHz which necessarily is on 14.1, and it does not 
>> appear to be having watchdog resets.� Lightly loaded however, just 2 subs.
>> �
>> �
>> From: Joe Falaschi <mailto:[email protected]>
>> Sent: Thursday, January 21, 2016 11:34 AM
>> To: [email protected] <mailto:[email protected]>
>> Subject: Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium With 
>> Love- Replace the bad ePMP units.
>> �
>> We see a ton of reboots on the 450 platform as well.� It's getting pretty 
>> frustrating simply because this is such a long term issue.� One of my APs 
>> has rebooted 195 times (now running 14.1.2).� They are saying we should 
>> replace the AP but it is unclear if we can RMA it or not.� We do have an 
>> open ticket.
>> �
>> Joe Falaschi
>> e-vergent
>> �
>> �
>> �
>> <Screen Shot 2016-01-21 at 11.30.16 AM.png>
>> On Jan 20, 2016, at 9:26 PM, Mark Radabaugh wrote:
>> 
>> 
>> Hum�� sounds very similar.�� It�s temperature sensitive as well - 
>> gets far worse with low temperatures, and we are having pretty cold temps 
>> this week.��
>> �
>> Extremely frustrating and causing real customer complaints.
>> �
>> Mark
>> �
>> On Jan 20, 2016, at 9:28 PM, Tushar Patel < 
>> <mailto:[email protected]>[email protected] <mailto:[email protected]>> wrote:
>> �
>> Over two years we have been seeing random reboot. We were told over and over 
>> again you are the only one.� Then few people started reporting.
>> �
>> But cambium never could get bottom of the problems for two years so, I gave 
>> up on cambium fixing this random reboot.� We stop calling them about it.
>> �
>> As the new versions of the software has come out over two years we have see 
>> the frequency of the problem reduce but not gone away.
>> 
>> Tushar 
>> �
>> 
>> On Jan 20, 2016, at 6:25 PM, Mark Radabaugh < 
>> <mailto:[email protected]>[email protected] <mailto:[email protected]>> wrote:
>> 
>> Tushar, 
>> �
>> What did you give up on?�� Or do?
>> �
>> Please note the mailing and shipping address change below:
>> �
>> Mark Radabaugh
>> Amplex
>> 22690 Pemberville Rd
>> Luckey, OH 43443
>> 419-837-5015 x1021
>>  <mailto:[email protected]>[email protected] <mailto:[email protected]>
>> �
>> On Jan 20, 2016, at 4:49 PM, Tushar Patel < 
>> <mailto:[email protected]>[email protected] <mailto:[email protected]>> wrote:
>> �
>> That's what they used to tell us too.� We have given up on the subject 
>> now. 
>> 
>> Tushar 
>> �
>> 
>> On Jan 20, 2016, at 1:09 PM, Mark Radabaugh < 
>> <mailto:[email protected]>[email protected] <mailto:[email protected]>> wrote:
>> 
>> Wait - they keep telling us we are the only ones that this happens to with 
>> 450?
>> �
>> So who else is having reboot-o-rama with 450�s?
>> �
>> Mark
>> �
>> On Jan 20, 2016, at 1:20 PM, Brian Sullivan < 
>> <mailto:[email protected]>[email protected] 
>> <mailto:[email protected]>> wrote:
>> �
>> I wish they would fix/replace the bad 450 AP's that suffer from Watchdog 
>> Resets.� 
>> Although replacing 100 450 AP's is cheaper than ePMP.� :-/
>> 
>> On 1/20/2016 12:11 PM, Josh Luthman wrote:
>> Why would making the memory faster degrade performance?
>> �
>> �
>> Josh Luthman
>> Office: 937-552-2340
>> Direct: 937-552-2343
>> 1100 Wayne St
>> Suite 1337
>> Troy, OH 45373
>> �
>> On Wed, Jan 20, 2016 at 1:00 PM, Tyson Burris @ Internet Communications Inc 
>> < <mailto:[email protected]>[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hello Cambium,
>> 
>> �
>> At the MidWest-IX launch party last night, several of us Indiana WISPs 
>> compared notes on the �cold weather� problems we are seeing with 
>> ePMPs.� It was very interesting to learn we are experience identical 
>> problems across the spectrum. 
>> We all understand this is a DRAM issue with certain units you have 
>> identified.� We also understand the firmware RC that has been made 
>> available to fix this short term.
>> The bottom line is we are very frustrated and grow tired of dealing with 
>> it.�
>> 
>> �
>> Our concern is simple.� If your software fix �degrades� the 
>> performance of the product or triggers other issues, as it has been 
>> suggested, we would prefer a full recall and replacement program immediately.
>> 
>> �
>> If the suggestion that the fix will degrade the product performance is 
>> inaccurate and not cause other issues, I would like for this to be made 
>> public.�
>> 
>> �
>> Thank you,
>> 
>> �
>> Tyson Burris, President 
>> Internet Communications Inc. 
>> 739 Commerce Dr. 
>> Franklin, IN 46131 
>> � 
>> 317-738-0320 <tel:317-738-0320>Daytime # 
>> 317-412-1540 <tel:317-412-1540>Cell/Direct # 
>> Online:  <http://www.surfici.net/>www.surfici.net <http://www.surfici.net/>
>> 
>> �
>> <Mail Attachment.png>
>> What can ICI do for you?
>> 
>> Broadband Wireless - PtP/PtMP Solutions - WiMax - Mesh Wifi/Hotzones - IP 
>> Security - Fiber - Tower - Infrastructure. 
>> � 
>> CONFIDENTIALITY NOTICE: This e-mail is intended for the 
>> addressee shown. It contains information that is 
>> confidential and protected from disclosure. Any review, 
>> dissemination or use of this transmission or its contents by 
>> unauthorized organizations or individuals is strictly 
>> prohibited.
>> 
>> �
>> �
>> 
>> �
>> �
>> �
>> �
>> �
>> �
>> �
>> �
>> �
> 
> 

Reply via email to