Thanks, Chuck.  I know this is difficult for people affected and that magnifies 
down on us as well because it is never a good thing to have issues like this 
going on.

Regarding memory coherency – we do have multiple data masters which also 
complicates the problem and the debugging.  My use of the term was more along 
the lines of memory stability – what we write, we read back, and what we read 
once, we can read again with the same value.  Those things which are (and 
should be) taken for granted are no longer true once this issue hits and you 
can see how that could lead to any number of “strange” and seemingly unrelated 
issues.  And the mostly look like garbage writers, which software developers 
are always under suspicion for creating!

-Aaron


From: Af [mailto:[email protected]] On Behalf Of Chuck McCown
Sent: Thursday, January 21, 2016 6:59 PM
To: [email protected]
Subject: Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium With Love- 
Replace the bad ePMP units.

Nice report.  I wish all companies would fess up like this when there is an 
issue.
I can see where memory issues can manifest in a large number of symptoms.  Good 
that you can reproduce it.  I have been in your situation many times even 
creating custom hardware to catch other hardware messing up so we could figure 
out what is happening.  Not fun until you can reproduce it.

I am not familiar with the term memory coherency.  I understand laser 
coherency.  And phase coherency and even spread spectrum coherency.   So I 
looked it up.  I think we used to have a different name for this... like 
asynchronous bus arbiter?

From: Aaron Schneider<mailto:[email protected]>
Sent: Thursday, January 21, 2016 5:50 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium With Love- 
Replace the bad ePMP units.

Hi Everyone –

Sorry for the delay in response on this thread.  I’d like to give an update of 
where we are with this issue.

First off, I would like to apologize for  the issues that this is causing.  We 
have heard reports for awhile in varying fashion, and Tushar had been talking 
about having things like this for quite some time, but we were having issues 
finding some correlation between reports (configuration, network topology, 
etc), as well as being unable to recreate the issue in our lab on demand.  This 
issue appears to have definitely got worse in the 13.4 release and is becoming 
more widespread as the weather turns.

What we have found out in the last several weeks is that there is an issue with 
the memory controller code in the FPGA.  What this leads to is memory coherency 
being lost which actually has now been verified to lead to several issues.  We 
had seen reports of various resets over time but had no reason to correlate 
them to one root cause until now.   The most prevalent of these is the Watchdog 
Reset without any accompanying crash log.  The other issues with the same root 
cause are the Illegal Instruction crash, the Invalid NiBuf crash, as well as 
any Null Exception Handler crash.  The bottom line is, when memory contents 
glitch on your software, it depends on when it happens as to what the outcome 
is.  We have found this to be very reproducible at very cold temperatures (-20C 
-- -50C), but it has been seen and reported at higher temperatures, just not as 
often.

The nature of the FPGA based memory controller is that there can be timing 
issues that get exacerbated at extreme temperatures.  If you don’t have proper 
constraints in place for a given signal path, its timing characteristics can 
change on you as temperature changes.  Also, if you don’t have a proper 
constraint in place, even recompiling the FPGA can change the characteristics 
that then make what used to work fine susceptible to extremes.   Something 
happened with the 13.4 FPGA that brought this to the edge such that it is now a 
problem and as we are seeing with winter cold coming in, becoming much more 
prevalent at cold temperatures.  13.4 and 13.4.1 have the same FPGA.  14.1.2 
has a new FPGA and there have been some improvements made in this area, but we 
have found it is still susceptible to the problem.

We are reproducing the problem in our lab and we have multiple developers 
digging in to figure out what is going on.  These types of issues with timing 
are generally very difficult to find and fix, but this is our highest priority 
right now and we will not have another release until this is fixed.

I’ve talked mostly about 13.4 and 13.4.1 here, but the nature of this issue and 
how it can interact with hardware doesn’t preclude it from having been the 
cause of the issues some (like Tushar) have seen over time.  Once we have a fix 
for this, we will be adding more rigorous regression testing including an 
internal HW memory test to validate that this type of memory issue doesn’t come 
back again.

From what we’ve seen and heard, this issue only affects the 450 AP FPGA and is 
not an issue on the 450 SM, 430AP/SM, nor the 450i devices.  The 450i is a very 
different architecture and has a hardware based memory controller and watchdog 
timer whereas on the 450/430 based devices, these items are in the FPGA.


Again, I apologize for the severe inconvenience and realize that it is getting 
colder and colder in NA so we are racing against the clock with this.   As soon 
as we have any updates and new open beta loads with a fix, I’ll let you know.

I appreciate your patience.

Regards,
-Aaron

From: Af [mailto:[email protected]] On Behalf Of Brian Sullivan
Sent: Thursday, January 21, 2016 4:11 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium With Love- 
Replace the bad ePMP units.

I was assured today that the issue isn't the hardware.� Evidently this issue 
can be solved with an upcoming software upgrade.
Time will tell.
http://community.cambiumnetworks.com/t5/PMP-450/13-2-to-13-4-System-Reset-Exception-Watchdog-Reset/td-p/43347/page/2
On 1/21/2016 4:02 PM, Joe Falaschi wrote:
We have some APs that have uptime over 60 days but many reboot every 1-3 weeks. 
�This is definitely an outlier. �We've been in contact with Cambium on this 
via an open ticket and sending them all of the information they request and 
nobody has said oh gosh that is bad hardware RMA it. �So, we're just going 
around and around. �We'll end up just replacing it and hoping they will take 
it back because obviously this is bad. �We are running 14.x per their 
request. �We saw this on 13.x as well.

Joe


On Jan 21, 2016, at 12:05 PM, Ken Hohhof wrote:

Joe, that is seriously bad.� I see watchdog resets and a few stack dumps, but 
uptime on 450 APs is typically 2-4 weeks, despite the recent cold weather, in 
fact I don�t think it has been more common than it was last summer.� I have 
not gone to 14.x though, everything is still on 13.2.
�
So either you have a bad unit, or 14.x is making it much worse.� If everyone 
was seeing resets every few minutes or hours, I think there would be villagers 
with torches and pitchforks outside Cambium HQ.
�
Brian from FVI does have a thread on the Cambium Community about this.
�
FWIW, I have one 450i 900 MHz which necessarily is on 14.1, and it does not 
appear to be having watchdog resets.� Lightly loaded however, just 2 subs.
�
�
From: Joe Falaschi<mailto:[email protected]>
Sent: Thursday, January 21, 2016 11:34 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium With Love- 
Replace the bad ePMP units.
�
We see a ton of reboots on the 450 platform as well.� It's getting pretty 
frustrating simply because this is such a long term issue.� One of my APs has 
rebooted 195 times (now running 14.1.2).� They are saying we should replace 
the AP but it is unclear if we can RMA it or not.� We do have an open ticket.
�
Joe Falaschi
e-vergent
�
�
�
<Screen Shot 2016-01-21 at 11.30.16 AM.png>
On Jan 20, 2016, at 9:26 PM, Mark Radabaugh wrote:

Hum�� sounds very similar.�� It�s temperature sensitive as well - 
gets far worse with low temperatures, and we are having pretty cold temps this 
week.��
�
Extremely frustrating and causing real customer complaints.
�
Mark
�
On Jan 20, 2016, at 9:28 PM, Tushar Patel 
<[email protected]<mailto:[email protected]>> wrote:
�
Over two years we have been seeing random reboot. We were told over and over 
again you are the only one.� Then few people started reporting.
�
But cambium never could get bottom of the problems for two years so, I gave up 
on cambium fixing this random reboot.� We stop calling them about it.
�
As the new versions of the software has come out over two years we have see the 
frequency of the problem reduce but not gone away.

Tushar
�

On Jan 20, 2016, at 6:25 PM, Mark Radabaugh 
<[email protected]<mailto:[email protected]>> wrote:
Tushar,
�
What did you give up on?�� Or do?
�
Please note the mailing and shipping address change below:
�
Mark Radabaugh
Amplex
22690 Pemberville Rd
Luckey, OH 43443
419-837-5015 x1021
[email protected]<mailto:[email protected]>
�
On Jan 20, 2016, at 4:49 PM, Tushar Patel 
<[email protected]<mailto:[email protected]>> wrote:
�
That's what they used to tell us too.� We have given up on the subject now.

Tushar
�

On Jan 20, 2016, at 1:09 PM, Mark Radabaugh 
<[email protected]<mailto:[email protected]>> wrote:
Wait - they keep telling us we are the only ones that this happens to with 450?
�
So who else is having reboot-o-rama with 450�s?
�
Mark
�
On Jan 20, 2016, at 1:20 PM, Brian Sullivan 
<[email protected]<mailto:[email protected]>> wrote:
�
I wish they would fix/replace the bad 450 AP's that suffer from Watchdog 
Resets.�
Although replacing 100 450 AP's is cheaper than ePMP.� :-/
On 1/20/2016 12:11 PM, Josh Luthman wrote:
Why would making the memory faster degrade performance?
�
�
Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373
�
On Wed, Jan 20, 2016 at 1:00 PM, Tyson Burris @ Internet Communications Inc 
<[email protected]<mailto:[email protected]>> wrote:
Hello Cambium,

�
At the MidWest-IX launch party last night, several of us Indiana WISPs compared 
notes on the �cold weather� problems we are seeing with ePMPs.� It was 
very interesting to learn we are experience identical problems across the 
spectrum.
We all understand this is a DRAM issue with certain units you have 
identified.� We also understand the firmware RC that has been made available 
to fix this short term.
The bottom line is we are very frustrated and grow tired of dealing with it.�

�
Our concern is simple.� If your software fix �degrades� the performance 
of the product or triggers other issues, as it has been suggested, we would 
prefer a full recall and replacement program immediately.

�
If the suggestion that the fix will degrade the product performance is 
inaccurate and not cause other issues, I would like for this to be made 
public.�

�
Thank you,

�
Tyson Burris, President
Internet Communications Inc.
739 Commerce Dr.
Franklin, IN 46131
�
317-738-0320<tel:317-738-0320> Daytime #
317-412-1540<tel:317-412-1540> Cell/Direct #
Online: www.surfici.net<http://www.surfici.net>

�
<Mail Attachment.png>
What can ICI do for you?

Broadband Wireless - PtP/PtMP Solutions - WiMax - Mesh Wifi/Hotzones - IP 
Security - Fiber - Tower - Infrastructure.
�
CONFIDENTIALITY NOTICE: This e-mail is intended for the
addressee shown. It contains information that is
confidential and protected from disclosure. Any review,
dissemination or use of this transmission or its contents by
unauthorized organizations or individuals is strictly
prohibited.
�
�

�
�

�
�
�
�


Reply via email to