Great post.

>But here's the NEXT problem - all Optus networking is offshore. There's almost 
no-one in Australia who can physically fix it.

>So what do you do when your offshore outsourced network guys break your core network infrastructure, and you've retrenched everyone who can fix it locally?

>You have a 7 hour outage, that's what you do.

Ouch!

We have government and other services depending on a foreign company with 
critical infrastructure overseas!!!!


On 2023/11/9 11:33 am, Kate Lance wrote:
Hi Narelle,

An interesting post on Mastodon from Rob Thomas, supporting the idea it was
a route reflector overload -
https://mastodon.au/@xrobau/111376847362633903

The problem yesterday started at about 4am, when Optus told the world 'I no
longer have any internet connectivity', and 'Do not send any internet traffic
to me, at all'. The technical description is that they withdrew ALL of their
routes from the #DFZ (Which is "The Internet", as seen by all the core routers
that ACTUALLY control the internet).

However, as a precursor at about 3am there was a hint that things weren't
perfect, as there was a flurry of changes from Optus to the outside world
saying, roughly, 'Something has changed inside my network, but you can still
keep sending me stuff'.

Now, as two final bits of possibly relevant information, the default for
maximum-prefix on #Cisco #ASR9000 is 1048576 (this number is 'the number of
routes that can be accepted by this router'), and MOST IMPORTANTLY the DFZ
("the internet") has about 980,000 routes in it at the moment. That's only 90k
odd routes LESS than the default maximum.

I'd be amazed if Optus has less than 100k internal routes  that aren't visible
to the internet, but are visible internally.

So here's what I think happened. The at 3am, the first core #router was
upgraded, and a new config was put in place. This did not join the network
correctly, and things were half broken. What SHOULD have happened is that all
the changes should have stopped, and either rolled back, or waited for further
investigation (the cause being that more than 1mil routes were visible, causing
it to shut down)

However, someone decided 'Well, maybe if we upgrade the SECOND one, that'll fix
the first one' at 4am. That broke the SECOND one, and took Optus completely off
the internet.

(Continued, see next for why this is far worse than it should have been)
.....


Regards, Kate


On Wed, Nov 08, 2023 at 05:33:43PM +1100, Narelle Clark wrote:
Rumour has it was a BGP update from an external source that wasn't filtered
properly with which the BGP route reflectors then overloaded the internal
routers. Persistently.

It was clearly an internal transport problem arising from an underlying IP
protocol. BGP fits that bill completely as it would be redistributed, and
clearly their management network isn't sufficiently out of band. Once a
network of that scale goes down like that, you can't just turn it back on
and expect it to all work fine - millions of devices all want to
re-register at once, and all those state changes across the network have to
converge...

Narelle

On Wed, 8 Nov 2023 at 10:30, Alex (Maxious) Sadleir <[email protected]>
wrote:

Around 4am, Optus networks re-announced all their BGP routes at once
https://radar.cloudflare.com/routing/as7474
https://radar.cloudflare.com/routing/as4804
This is indicative of a change management malfunction, akin to IBM's
2016 eCensus routers restarting with no routes

https://www.itnews.com.au/news/ibm-treasury-in-settlement-talks-over-census-failure-440066

The VoWifi infrastructure seems to be online but unable to connect any
calls
https://goughlui.com/2023/11/08/breaking-optus-nationwide-outage-08-11-2023/
More alarmingly 000 doesn't work on landlines
https://twitter.com/lucethoughts/status/1722029287727825124 contrary
to advice from emergency services
https://twitter.com/nswpolice/status/1722028862161449151

On Wed, Nov 8, 2023 at 10:08 AM Tom Worthington
<[email protected]> wrote:
Any more news on what caused the Optus network outage? On ABC Canberra
Radio this morning I suggested it was most likely a software upgrade
which went wrong, and would be fixed by 6pm.

Is VoWiFi working?

I use Telstra, but when COVID-19 struck, I purchased an Optus 4G modem,
with an Optus SIM. This was in case Telstra went down.


--
Tom Worthington http://www.tomw.net.au
_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link
_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link


--


Narelle
[email protected]
_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link
_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link

--
Kim Holburn
IT Network & Security Consultant
+61 404072753
mailto:[email protected]  aim://kimholburn
skype://kholburn - PGP Public Key on request


_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link

Reply via email to