RE: A survey about networking incidents

Aaron Gould Thu, 24 Jan 2019 09:33:26 -0800

It seems that this is even increasingly harder in a MEF/SP-type Layer 2 
emulated network of eline, elan, etree type things…


 

Yeah seems that you have to have synthetic-type traffic generated and inserted 
into the data path to measure on…

 

Isn’t CFM/Ethernet OAM supposed to segment up the network into management 
domains-of-responsibility with mips/meps, etc so that you can real-time-monitor 
your system and others can monitor theirs… I have not set this up, but I 
thought that was one way of being able to know on-going the state of the 
network, link-by-link and endpoint-to-endpoint… I think on-going CMM’s flow to 
give you an idea of the extent to which links and services are good or not good.

 

Perhaps that’s the proof you could point at for anyone trying to blame the 
network

 

I’m sure there are other ways… like cisco’s ip sla… accedian’s paa, twamp (I 
just remembered about twamp, and I think that’s perhaps an ip-layer version of 
what is like Ethernet layer cfm/oam, I could be wrong…but as I think about it, 
I recall mpls-oam, perhaps others too

 

Yes, as network engineer’s, I/we continually have to clear-my-name (clear the 
network) of blame

 

-Aaron

 

p.s. I’ll try to look at the survey later

 

 

 

From: NANOG [mailto:[email protected]] On Behalf Of Yu, 
Minlan
Sent: Wednesday, January 23, 2019 9:32 AM
To: [email protected]
Subject: A survey about networking incidents

 

Hi Nanog,

 

We all know that networks are at the heart of many of the systems we use today. 
When these systems break, the underlying networks are often the first suspects. 
Networks are hard to diagnose and they are most likely to be blamed for 
problems even if they are completely healthy. As networking engineers, we have 
all seen cases where another part of the system was causing an issue but the 
network was held the suspect until the problem was resolved.

 

We are researchers from Harvard and The University of Pennsylvania who are 
interested in understanding this problem and its impact better in order to 
build a solution. Our goal is to be able to quickly rule out the network as a 
root cause for incidents in order to be able to speed up diagnosis and also to 
improve operator efficiency. We are interested in learning the answer to a few 
questions. Specifically, we would like to know: How often do you see problems 
where the network is blamed but after investigating you find the problem to be 
caused by some other part of the system? How often have you had incidents where 
the cause of the incident was outside of the boundary of your organization? How 
much do you think fixing this problem can help you and your organization more 
quickly diagnose problems?

 

We have created a *very* short survey to be able to get an operator's 
perspective on these questions. It should take less than 15 minutes to finish. 
The findings should help us as well as the research community at large to be 
able to build a solution that can benefit all types of networks, of different 
sizes, to improve how they do the diagnosis. We will be presenting the results 
of this anonymous survey in a scientific article later this year. We will 
report back our research once it's finished.

 

Survey URL: 
https://docs.google.com/forms/d/e/1FAIpQLScx-U54eQFQi5AdBCOOucMaI6BVmLwcMFiZl2HVZ9bHi1q8bA/viewform

 

We would greatly appreciate it if you could help us with this research.  Please 
feel free forward this survey to other operators you know. Thank you!

 

Minlan Yu

http://minlanyu.seas.harvard.edu/

RE: A survey about networking incidents

Reply via email to