L’incident est-il complètement terminé ?

J’ai encore un symptôme, mais pourtant pas de pertes de paquet à priori.


> Le 15 janv. 2019 à 22:44, Simon Muyal <smu...@franceix.net> a écrit :
> 
> Bonjour Raphael,
> 
> 
> Comme l'avait dit Youssef, nous avons communiqué hier sur le canal privilégié 
> qui est la ML France-IX, suivi par l'ensemble des membres France-IX. Nous 
> avons posté un rapport plus complet aujourd'hui, voir ci-dessous.
> 
> Nous avons bien vérifié hier soir que les problèmes avec CF ou Google dans la 
> soirée n'étaient pas associés à France-IX.
> 
> Pour ce qui est des communications pendant les incidents, nous privilégions 
> la ML France-IX et demandons aux membres d'utiliser ce canal pour un meilleur 
> suivi. Nous allons également créer une page prochainement pour afficher les 
> maintenances/incidents en cours. Ca permettra de donner de la visibilité 
> également aux non-membres comme tu l'indiques.
> 
> 
> ++
> 
> Simon
> 
> 
> ---
> 
> Dear members,
> 
> You will find below a report concerning the issue encountered yesterday in 
> the afternoon:
> 
> *12:20 (Paris time):* We started observing some unusual BUM traffic 
> (Broadcast, Unknown Unicast, Multicast) on PoPs where BUM rate limiting is 
> performed globally (not per interface): PA7, PAR1 and TH3 PoPs.
> 
> We tried to determine the origin of this flooded traffic, looking for loops, 
> checking MAC addresses consistency on different PoPs. At this stage, our 
> probe's network (a 10G probe per device) didn't raise any alert and there was 
> no loss observed by probes. Nonetheless, we had some members complaining, 
> indicating losses towards France-IX.
> 
> The sniffer's captures allowed us to determine that it was unknown unicast 
> traffic from several sources to few destinations. BUM traffic reached 10 to 
> 15Mbps. This traffic was observed even if MAC table entries were OK.
> 
> *Around 15:00 (Paris time) :* BUM traffic reached more than 50Mbps, causing 
> additional impact, mainly on small and medium routers on customer side. We 
> cleared some MAC address entries where we observed flooding, with no effect. 
> As we didn't observe any abnormal behaviour on customer side we started 
> clearing some MPLS/LSPs circuits and shutting down backbone links one by one 
> to avoid to create additional impact. This allowed to isolate the problem, 
> issue was located on PAR5 PoP, clearing MPLS circuits used between PAR5 and 
> PAR1. During these operations, PAR1 PoP was isolated during 4 minutes between 
> 16:00 and 16:04 in order to find the root cause.
> 
> We are in touch with the vendor to understand this behaviour and sharing logs 
> to find the root cause. We will keep you informed as soon as we have more 
> information.
> 
> ---
> Location: FranceIX Paris LAN
> 
> Incident start: 14th of January 2018, 12:21 (UTC+1, Paris Time)
> Incident end: 14th of January 2018, 16:08 (UTC+1, Paris Time)
> 
> Customer impact: Some members observed packed loss during this period
> ---
> 
> We share with you the different works in progress to detect this kind of 
> issues:
> 
>   - Specific alerts when BUM traffic threshold is reached on every PoP 
> (*already done since yesterday*)
>   - Enhancement on QoS probes to be as close as possible to member 
> configuration : BGP router configured on each probe and permanent traffic 
> generated. This will be deployed in Q1-2019
>   - We plan to test 18.R1 firmware soon. This version enhances the way of 
> processes and memory are managed in the platform. This will be tested during 
> Q1-2019 and probably deployed during Q2-2019
>   - EVPN : For long term, we plan to activate EVPN, and BUM traffic will be 
> better controlled
>   - Definition of a specific process to react quickly if the issue occurs 
> again
> We apologize again for such issue. Sorry if you considered we didn't 
> communicate enough during the incident, we communicated as soon as we had new 
> information to provide
> 
> 
> Le 14/01/2019 à 21:40, Raphael Mazelier a écrit :
>> On 14/01/2019 20:59, Radu-Adrian Feurdean wrote:
>> 
>>> Presque certainement pas. Le traffic avait disparu aussi via Equinix-IX 
>>> pour passer (apres une chute brutale) entierement sur du transit. 
>>> Actuellement ca a l'air de preprendre un peu cote Equinix. Cote FranceIX, 
>>> je sais pas (je fais du prepend), mais le "festival Akamai" a bien commence 
>>> son episode de cette soiree (traffic qui bascule de PNI vers France-IX).
>>> 
>> 
>> OK merci de la précision. Ce qui me faisait penser à ca c'était des reports 
>> de personne qui avait perdu 8.8.8.8 aussi. Sinon il y a quoi qui tabasse les 
>> CDNs en ce moment pour qu'ils doivent re-router ?
>> 
>> -- 
>> Raphael Mazelier
>> 
>> 
>> 
>> ---------------------------
>> Liste de diffusion du FRnOG
>> http://www.frnog.org/
> 
> -- 
> Simon Muyal
> CTO
> FranceIX
> Tél: +33 (0)1 70 61 97 74
> Mob: +33 (0)6 21 17 29 51
> 
> 
> ---------------------------
> Liste de diffusion du FRnOG
> http://www.frnog.org/


---------------------------
Liste de diffusion du FRnOG
http://www.frnog.org/

Répondre à