Hi folks,
The Hello clients (desktop and mobile) had trouble making calls over the
weekend and for part of today. This was due to a problem in the
LoopPush server update. I asked Ben Bengert from the Push Server team
to write a summary of what happened:
On Friday around 11am PST, ops deployed pushgo 1.4rc5 to production
SimplePush and production LoopPush. Errors started occurring almost
immediately. Several hours later a hotfix (1.4rc6) was deployed to
remedy the error. This fix apparently resolved the notification
delivery errors. It was discovered Saturday that the errors had
returned, on Sunday a new fix was made based on some analysis of the
code involved. On Monday morning 1.4rc7 was deployed to production
SimplePush that has thus far remedied the issue.
The pushgo 1.4 series replaces the prior system in how it handles
inter-node notification routing (amongst many other changes). The
new system uses a peer discovery system backed by etcd such that
each server in the cluster registers itself and then queries etcd to
discover its peers. Due to a bug in how network failures were
handled, if an attempt to check for peers in failed pushgo would
wipe its known list of peers entirely. A similar bug in error
handling resulted in servers that failed to re-register their
presence in etcd being removed from etcd. Fixes for these bugs are
in 1.4rc7 and have held up in the hours since deployed with no
losses in peer visibility.
There were several problems in process leading up to this
deployment, as it was not intended to be deployed to production
simplepush. The Bugzilla ticket in question (#1097324) indicated
deployment should occur for both production simplepush and loop-push
when it should have only been deployed for loop-push. We are
currently conducting a more thorough postmortem of the issue to
determine appropriate steps to prevent unintended deployments like
this from occurring again.
--
Maire Reavy <[email protected]>
Mozilla
_______________________________________________
dev-media mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-media