Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium With Love- Replace the bad ePMP units.

Mark Radabaugh Fri, 22 Jan 2016 10:17:05 -0800

I do see the SM’s drop session when the unit goes from sync source to a 
different source - there is definitely a shift between the various sources.   
There are telnet commands in the newer software to disable and enable various 
sync sources without rebooting the unit - very helpful for getting a AP back 
where you want it without a reboot, at the risk of causing a reregistration of 
some or all of the SM’s.


If I had to speculate on the actual issue it would be a combination of a couple 
of things.

a) Cambium has never (to my knowledge) published a detailed specification of 
the sync over power requirements.   As a result the 3rd party vendors have had 
to reverse engineer the design.
b) Cambium was not consistent in the timing from their own equipment - 
internal, CMM3, CMM4, iGPS
c) The POE cabling and voltage has changed a number of times (this had to 
happen - nothing that can be done about that, standards move on)
d) The ’sync detection’ circuit in the AP’s has changed a number of times.  
Again this had to happen, components and designs change.   It’s nearly 
impossible to make them precisely the same.
e) The 3rd party timing source vendors have tried to work around timing issue, 
while Cambium has tried to tweak things to deal with 3rd party timing - and 
they end up confusing each other when (a) doesn’t exist.

I think I got off topic….    Anyway - since this is sync over power and the 
signal is a chop in the DC voltage (a square wave) a couple of things happen 
when you attach a long cable to it.   The inductance and capacitance of the 
wire distorts the square wave.   The leading and trailing edge becomes a slope, 
possibly with some ringing on top of that.  The detection circuit in the AP has 
to decide where on the slope it’s going to ‘detect’ the timing.  Even if that 
point is precisely the same voltage on every AP revision, the variation in 
cable length, inductance, and capacitance is  going to make each installation 
different.   More knobs could be added to the AP, but it’s going to be mighty 
hard to figure out what your supposed to set them to.

Sync over power, while an elegant solution to timing on the original FSK 
radios, has outlived itself - timing is much more critical as data rates go up 
and sync over power isn’t going to cut it in it’s current form.

So why does this one AP go wonky?   My guess would be an interaction between 
the SyncInjector, something unique about the cabling to this specific AP (how 
and what the cable is strapped to, if the installer did something different 
with drip loops, slack storage, surge suppressor components, etc.) that causes 
the square wave to either ring or slew, and the AP either misinterprets this or 
tries to compensate for it and fails.   That doesn’t mean the SyncInjector did 
anything wrong - it’s probably doing exactly what it’s supposed to do.   The AP 
isn’t really necessarily wrong either - it’s doing what it can to recover a 
mangled signal.  

Nobody is truly wrong, it’s just a broken system.  

Mark


> On Jan 22, 2016, at 9:56 AM, George Skorup <[email protected]> wrote:
> 
> OK, wow. I feel so not alone now.
> 
> A week or two ago I had this happen... I have two sites about 8 miles apart. 
> One has a full 3.6 cluster (N,S,E,W) on a SyncInjector. The other is a single 
> NE sector on a Parasitic SyncPipe. Same frequency and none of the SMs on 
> either sector can hear the other APs, they all face away (i.e. the two sites 
> don't line up together for any SM), so standard frequency reuse scenario. We 
> had some speed complaints and I found all of the SMs on the North sector at 
> 8x/1X downlink. Rebooted the whole cluster with no resolution. Figured 
> something happened to that AP. Then I figured I'd check all other APs for any 
> issues. That's when I found this NE sector running on the on-board GPS. 
> Everything looked completely normal on this sector (SM links, etc). As soon 
> as I disabled the on-board GPS to force it back onto the SyncPipe, everything 
> returned to normal at the other site. I still don't understand that. It seems 
> like the iGPS AP's timing was slightly different and interfering with the 
> uplink on the other (the sectors can hear each other) which was causing a ton 
> of retries so it backed off the downlink? That's all I can figure.
> 
> Also, before I started disabling the on-board GPS on all APs in a cluster, 
> one would fail over from power port to on-board and I'd get LBT events. So I 
> do believe there's a difference in the timing, PacketFlux vs on-board, aka 
> CMM3 vs CMM4.
> 
> The other thing I see is dropped sessions when AutoSync switches to or from 
> on-board GPS and timing port/power port (PacketFlux on both). Sometimes it's 
> only the SMs actively passing traffic that go idle. I believe the pulse is 
> different and the AP has to re-align the frame. I could be wrong.
> 
> I've also mentioned this to Cambium multiple times now. I believe the 
> on-board GPS, even with it disabled because they don't actually turn the 
> receiver off, plays a role with AutoSync going stupid. Again, I could be 
> wrong.
> 
> I remember you mentioned that you're seeing very close 3.6 SMs getting fairly 
> bad downlink SNR due to them hearing the opposite AP on the same freq, even 
> with the high f/b OEM Laird sectors. We're definitely seeing that too.
> 
> Anyway, that's all I got for now. I'm not blaming anyone or pointing fingers, 
> mostly because I'm not entirely sure it's Cambium's or PacketFlux's fault, 
> could be both, I don't know. Shit happens. But shit has been happening for a 
> really long time and I'm very frustrated. Aaron knows this, I send him stuff 
> all the time. :)
> 
> On 1/22/2016 5:38 AM, Mark Radabaugh wrote:
>> Inline
>> 
>>> On Jan 22, 2016, at 12:02 AM, George Skorup <[email protected]> wrote:
>>> 
>>> Sounds like you mean that sector's frame timing is drifting? I haven't seen 
>>> that. But I have seen sectors go nuts where all of the SMs show 12dB SNR 
>>> and MIMO-A downlink. For no reason. Reboot the AP and it's gone.
>>> 
>> I do think the sectorï¿½s frame timing is drifting.   The end result is 
>> exactly the same thing you are seeing - all the SM end up with low SNR and 
>> MIMO-A.  Reboot it and it goes away for 8 to 12 hours before itï¿½s back.   
>> It truly smells like timing drift.  I donï¿½t think it keeps drifting though 
>> - I think the AP works itï¿½s way off to some sort of offset limit on the 
>> timing and then sits there.  Cambium was playing around with something in 
>> the software with a ï¿½debounce timingï¿½ that (I think) compared the 
>> incoming pulse with the internal clock for a sanity check and then rejected 
>> pulses outside the window.  Iï¿½m not sure they didnï¿½t end up putting some 
>> type of averaging or compensating calculation in that might be ending up 
>> working itï¿½s way to a limit.
>> 
>> 
>>> On full cluster sites, we're using SyncInjectors, and only SyncInjectors. 
>>> I.E. I disable the on-board and timing ports on all sectors. If the sync 
>>> pulse from the injector drops, oh well. FreeRun is a PITA especially with 
>>> LBT.
>> Thatï¿½s my usual practice, for the same reasons.
>> 
>>> And I've said this many, many times. There's a difference between the 
>>> on-board GPS which is CMM4 aligned and PacketFlux stuff which is CMM3 
>>> aligned. The frames don't match when you have a mix of this in a cluster.
>> I donï¿½t have any way to measure the on-board timing pulse so I canï¿½t say 
>> where itï¿½s at relative to a sync injector, but I am getting very good 
>> performance with the north AP on the same frequency as the south with one on 
>> a SyncInjector and the other on internal GPS.   SNR on both sectors is 
>> stable and modulation is where I expect it to be.
>> 
>>> It's very obvious when this problem crops up. The AP session list shows 
>>> most SMs sitting at 8X/1X. And sessions with HP show the normal VC as 
>>> 8X/1X, HP VC as 1X/1X.
>> Yep
>> 
>>> I know at one site in particular, the SyncInjector doesn't show any 1PPS 
>>> active events, yet the APs show a few inSync and outSync counts. Those 
>>> could be actual losses in timing, but Forrest said it needs to see 
>>> something like 3-4 in a row before it's logged in the event counter. 
>>> However, at another site, the timing is very stable and the APs show zero 
>>> outSync and one inSync count. But weird things are still happening.
>> Same - the SyncInjector logs donï¿½t show anything unusual, nor does the AP.
>> 
>>> Another thing I've noticed is that a loss in the sync pulse doesn't always 
>>> show up in the AP event log. Or I'll see a loss message, but no acquired 
>>> message after that.
>>> 
>>> And I've been seeing this weird stuff for well over a year. I just don't 
>>> know what else to do. CMMs, CTMs and new radios do not fit in the budget.
>> Using the internal GPS isnï¿½t a viable permanent solution for this, but 
>> itï¿½s working for the moment, and I have not seen a recurrence of the 
>> problem in a week.   Iï¿½m going to try a CMM to see if it makes any 
>> difference, and if that fails Iï¿½ll probably resort to dragging a timing 
>> port device up there and using that.
>> 
>> Mark
>

Re: [AFMUG] Cambium 450 Watchdog resets - was: To Cambium With Love- Replace the bad ePMP units.

Reply via email to