> From: James Bensley <[email protected]> > Sent: Thursday, June 27, 2019 9:56 AM > > One experience I have made is that when there is an outage on a large PE, > even when it still has spare capacity, is that the business impact can be too > much to handle (the support desk is overwhelmed, customers become irate > if you can't quickly tell them what all the impacted services are, when > service > will be restored, the NMS has so many alarms it’s not clear what the problem > is or where it's coming from etc.). > I see what you mean, my hope is to address these challenges by having a "single source of truth" provisioning system that will have, among other things, also HW-customer/service mapping -so Ops team will be able to say that if particular LC X fails then customers/services X,Y,Z will be affected. But yes I agree with smaller PEs any failure fallout is minimized proportionally. > > This doesn’t mean there isn’t a place for large routers. For example, in a > typical network, by the time we get to the P nodes layer in the core we tend > to have high levels of redundancy, i.e. any PE is dual-homed to two or more P > nodes and will have 100% redundant capacity. Exactly, while the service edge topology might be dynamic as a result of horizontal scaling the core on the other hand I'd say should be fairly static and scaled vertically, that is I wouldn't want to scale core routers horizontally and as a result have core topology changing with every P scale out iteration at any POP, that would be bad news for capacity planning and traffic engineering...
> > I’ve tried to write some of my experiences here > (https://null.53bits.co.uk/index.php?page=few-larger-routers-vs.-many- > smaller-routers). > The tl;dr version though is that there’s rarely a technical restriction to > having > fewer large routers and it’s an operational/business impact problem. > I'll give it a read, cheers. adam

