Pi and console cable???

Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373
On Jan 21, 2016 8:05 PM, "Chuck McCown" <[email protected]> wrote:

> I am guessing that if it caused by a fight between two CPUs/FPGAs for
> common memory, any dumps would be different each time.  You would actually
> have to put a hardware logic analyzer on the pins of the chip to catch it.
>
> *From:* Josh Luthman <[email protected]>
> *Sent:* Thursday, January 21, 2016 5:58 PM
> *To:* [email protected]
> *Subject:* Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium With
> Love- Replace the bad ePMP units.
>
>
> Would it be helpful to have a test or memory dump load for the APs it's
> happening consistently on?  Rather than reproducing it in the lab, just use
> real repeating units.
>
> Josh Luthman
> Office: 937-552-2340
> Direct: 937-552-2343
> 1100 Wayne St
> Suite 1337
> Troy, OH 45373
> On Jan 21, 2016 7:50 PM, "Aaron Schneider" <
> [email protected]> wrote:
>
>> Hi Everyone –
>>
>>
>>
>> Sorry for the delay in response on this thread.  I’d like to give an
>> update of where we are with this issue.
>>
>>
>>
>> First off, I would like to apologize for  the issues that this is
>> causing.  We have heard reports for awhile in varying fashion, and Tushar
>> had been talking about having things like this for quite some time, but we
>> were having issues finding some correlation between reports (configuration,
>> network topology, etc), as well as being unable to recreate the issue in
>> our lab on demand.  This issue appears to have definitely got worse in the
>> 13.4 release and is becoming more widespread as the weather turns.
>>
>>
>>
>> What we have found out in the last several weeks is that there is an
>> issue with the memory controller code in the FPGA.  What this leads to is
>> memory coherency being lost which actually has now been verified to lead to
>> several issues.  We had seen reports of various resets over time but had no
>> reason to correlate them to one root cause until now.   The most prevalent
>> of these is the Watchdog Reset without any accompanying crash log.  The
>> other issues with the same root cause are the Illegal Instruction crash,
>> the Invalid NiBuf crash, as well as any Null Exception Handler crash.  The
>> bottom line is, when memory contents glitch on your software, it depends on
>> when it happens as to what the outcome is.  We have found this to be very
>> reproducible at very cold temperatures (-20C -- -50C), but it has been seen
>> and reported at higher temperatures, just not as often.
>>
>>
>>
>> The nature of the FPGA based memory controller is that there can be
>> timing issues that get exacerbated at extreme temperatures.  If you don’t
>> have proper constraints in place for a given signal path, its timing
>> characteristics can change on you as temperature changes.  Also, if you
>> don’t have a proper constraint in place, even recompiling the FPGA can
>> change the characteristics that then make what used to work fine
>> susceptible to extremes.   Something happened with the 13.4 FPGA that
>> brought this to the edge such that it is now a problem and as we are seeing
>> with winter cold coming in, becoming much more prevalent at cold
>> temperatures.  13.4 and 13.4.1 have the same FPGA.  14.1.2 has a new FPGA
>> and there have been some improvements made in this area, but we have found
>> it is still susceptible to the problem.
>>
>>
>>
>> We are reproducing the problem in our lab and we have multiple developers
>> digging in to figure out what is going on.  These types of issues with
>> timing are generally very difficult to find and fix, but this is our
>> highest priority right now and we will not have another release until this
>> is fixed.
>>
>>
>>
>> I’ve talked mostly about 13.4 and 13.4.1 here, but the nature of this
>> issue and how it can interact with hardware doesn’t preclude it from having
>> been the cause of the issues some (like Tushar) have seen over time.  Once
>> we have a fix for this, we will be adding more rigorous regression testing
>> including an internal HW memory test to validate that this type of memory
>> issue doesn’t come back again.
>>
>>
>>
>> From what we’ve seen and heard, this issue only affects the 450 AP FPGA
>> and is not an issue on the 450 SM, 430AP/SM, nor the 450i devices.  The
>> 450i is a very different architecture and has a hardware based memory
>> controller and watchdog timer whereas on the 450/430 based devices, these
>> items are in the FPGA.
>>
>>
>>
>>
>>
>> Again, I apologize for the severe inconvenience and realize that it is
>> getting colder and colder in NA so we are racing against the clock with
>> this.   As soon as we have any updates and new open beta loads with a fix,
>> I’ll let you know.
>>
>>
>>
>> I appreciate your patience.
>>
>>
>>
>> Regards,
>>
>> -Aaron
>>
>>
>>
>> *From:* Af [mailto:[email protected]] *On Behalf Of *Brian Sullivan
>> *Sent:* Thursday, January 21, 2016 4:11 PM
>> *To:* [email protected]
>> *Subject:* Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium
>> With Love- Replace the bad ePMP units.
>>
>>
>>
>> I was assured today that the issue isn't the hardware.� Evidently this
>> issue can be solved with an upcoming software upgrade.
>> Time will tell.
>>
>> http://community.cambiumnetworks.com/t5/PMP-450/13-2-to-13-4-System-Reset-Exception-Watchdog-Reset/td-p/43347/page/2
>>
>> On 1/21/2016 4:02 PM, Joe Falaschi wrote:
>>
>> We have some APs that have uptime over 60 days but many reboot every 1-3
>> weeks. �This is definitely an outlier. �We've been in contact with
>> Cambium on this via an open ticket and sending them all of the information
>> they request and nobody has said oh gosh that is bad hardware RMA it.
>> �So, we're just going around and around. �We'll end up just replacing
>> it and hoping they will take it back because obviously this is bad. �We
>> are running 14.x per their request. �We saw this on 13.x as well.
>>
>>
>>
>> Joe
>>
>>
>>
>>
>>
>> On Jan 21, 2016, at 12:05 PM, Ken Hohhof wrote:
>>
>>
>>
>> Joe, that is seriously bad.� I see watchdog resets and a few stack
>> dumps, but uptime on 450 APs is typically 2-4 weeks, despite the recent
>> cold weather, in fact I don�t think it has been more common than it was
>> last summer.� I have not gone to 14.x though, everything is still on 13.2.
>>
>> �
>>
>> So either you have a bad unit, or 14.x is making it much worse.� If
>> everyone was seeing resets every few minutes or hours, I think there would
>> be villagers with torches and pitchforks outside Cambium HQ.
>>
>> �
>>
>> Brian from FVI does have a thread on the Cambium Community about this.
>>
>> �
>>
>> FWIW, I have one 450i 900 MHz which necessarily is on 14.1, and it does
>> not appear to be having watchdog resets.� Lightly loaded however, just 2
>> subs.
>>
>> �
>>
>> �
>>
>> *From:* Joe Falaschi <[email protected]>
>>
>> *Sent:* Thursday, January 21, 2016 11:34 AM
>>
>> *To:* [email protected]
>>
>> *Subject:* Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium
>> With Love- Replace the bad ePMP units.
>>
>> �
>>
>> We see a ton of reboots on the 450 platform as well.� It's getting
>> pretty frustrating simply because this is such a long term issue.� One of
>> my APs has rebooted 195 times (now running 14.1.2).� They are saying we
>> should replace the AP but it is unclear if we can RMA it or not.� We do
>> have an open ticket.
>>
>> �
>>
>> Joe Falaschi
>>
>> e-vergent
>>
>> �
>>
>> �
>>
>> �
>>
>> <Screen Shot 2016-01-21 at 11.30.16 AM.png>
>>
>> On Jan 20, 2016, at 9:26 PM, Mark Radabaugh wrote:
>>
>>
>>
>> Hum�� sounds very similar.�� It�s temperature sensitive as well
>> - gets far worse with low temperatures, and we are having pretty cold temps
>> this week.��
>>
>> �
>>
>> Extremely frustrating and causing real customer complaints.
>>
>> �
>>
>> Mark
>>
>> �
>>
>> On Jan 20, 2016, at 9:28 PM, Tushar Patel <[email protected]> wrote:
>>
>> �
>>
>> Over two years we have been seeing random reboot. We were told over and
>> over again you are the only one.� Then few people started reporting.
>>
>> �
>>
>> But cambium never could get bottom of the problems for two years so, I
>> gave up on cambium fixing this random reboot.� We stop calling them about
>> it.
>>
>> �
>>
>> As the new versions of the software has come out over two years we have
>> see the frequency of the problem reduce but not gone away.
>>
>> Tushar
>>
>> �
>>
>>
>> On Jan 20, 2016, at 6:25 PM, Mark Radabaugh <[email protected]> wrote:
>>
>> Tushar,
>>
>> �
>>
>> What did you give up on?�� Or do?
>>
>> �
>>
>> Please note the mailing and shipping address change below:
>>
>> �
>>
>> Mark Radabaugh
>> Amplex
>> 22690 Pemberville Rd
>>
>> Luckey, OH 43443
>> 419-837-5015 x1021
>> [email protected]
>>
>> �
>>
>> On Jan 20, 2016, at 4:49 PM, Tushar Patel <[email protected]> wrote:
>>
>> �
>>
>> That's what they used to tell us too.� We have given up on the subject
>> now.
>>
>> Tushar
>>
>> �
>>
>>
>> On Jan 20, 2016, at 1:09 PM, Mark Radabaugh <[email protected]> wrote:
>>
>> Wait - they keep telling us we are the only ones that this happens to
>> with 450?
>>
>> �
>>
>> So who else is having reboot-o-rama with 450�s?
>>
>> �
>>
>> Mark
>>
>> �
>>
>> On Jan 20, 2016, at 1:20 PM, Brian Sullivan <[email protected]>
>> wrote:
>>
>> �
>>
>> I wish they would fix/replace the bad 450 AP's that suffer from Watchdog
>> Resets.�
>> Although replacing 100 450 AP's is cheaper than ePMP.� :-/
>>
>> On 1/20/2016 12:11 PM, Josh Luthman wrote:
>>
>> Why would making the memory faster degrade performance?
>>
>> �
>>
>> �
>>
>> Josh Luthman
>> Office: 937-552-2340
>> Direct: 937-552-2343
>> 1100 Wayne St
>> Suite 1337
>> Troy, OH 45373
>>
>> �
>>
>> On Wed, Jan 20, 2016 at 1:00 PM, Tyson Burris @ Internet Communications
>> Inc <[email protected]> wrote:
>>
>> Hello Cambium,
>>
>>
>> �
>>
>> At the MidWest-IX launch party last night, several of us Indiana WISPs
>> compared notes on the �cold weather� problems we are seeing with
>> ePMPs.� It was very interesting to learn we are experience identical
>> problems across the spectrum.
>>
>> We all understand this is a DRAM issue with certain units you have
>> identified.� We also understand the firmware RC that has been made
>> available to fix this short term.
>>
>> The bottom line is we are very frustrated and grow tired of dealing with
>> it.�
>>
>>
>> �
>>
>> Our concern is simple.� If your software fix �degrades� the
>> performance of the product or triggers other issues, as it has been
>> suggested, we would prefer a full recall and replacement program
>> immediately.
>>
>>
>> �
>>
>> If the suggestion that the fix will degrade the product performance is
>> inaccurate and not cause other issues, I would like for this to be made
>> public.�
>>
>>
>> �
>>
>> Thank you,
>>
>>
>> �
>>
>> *Tyson Burris, President*
>> *Internet Communications Inc.*
>> *739 Commerce Dr.*
>> *Franklin, IN 46131*
>> *�*
>> *317-738-0320 <317-738-0320> Daytime #*
>> *317-412-1540 <317-412-1540> Cell/Direct #*
>> *Online: **www.surfici.net <http://www.surfici.net>*
>>
>>
>> �
>>
>> <Mail Attachment.png>
>>
>> *What can ICI do for you?*
>>
>>
>> *Broadband Wireless - PtP/PtMP Solutions - WiMax - Mesh Wifi/Hotzones -
>> IP Security - Fiber - Tower - Infrastructure.*
>> *�*
>> *CONFIDENTIALITY NOTICE: This e-mail is intended for the*
>> *addressee shown. It contains information that is*
>> *confidential and protected from disclosure. Any review,*
>> *dissemination or use of this transmission or its contents by*
>> *unauthorized organizations or individuals is strictly*
>> *prohibited.*
>>
>> �
>>
>> �
>>
>>
>> �
>>
>> �
>>
>>
>>
>> �
>>
>> �
>>
>> �
>>
>> �
>>
>>
>>
>>
>>
>

Reply via email to