> On May 9, 2017, at 3:32 PM, Schweiss, Chip wrote:
>
> This was a first for me and extremely painful to locate.
>
> In the middle of the night between last Friday and Saturday, I started
> getting down alerts from most of my network. It took 4 engineers including
> myself 9 hours to pinpoint the source of the problem.
>
> The problem turned out to be one of my OmniOS boxes sending out pure garbage
> constantly on layer 2 out the 10G network ports. This disrupted ARP caches
> on every machine on every VLAN that was trunked on these ports, not just the
> VLANs that were configured on the server. The switches reported every port
> healthy and without error. The traffic on the bad port was not high either,
> just severely disruptive.
Whoa! On L2 (like non-TCP/IP ethernet frames)?
> The affected OmniOS box appear to be healthy, as it was still serving the VM
> data stores for over 350 virtual machines. However, it like every other
> service on the network appeared to be up and down repeatedly, but NFS kept on
> recovering gracefully.
>
> The only thing that finally identified this server was when one of us plug a
> monitor to the console and saw "WARNING: proxy ARP problem?" happening so
> fast that it took taking a cellphone picture of it a high frame rate to read
> it. Powering off this server, cleared the problem for the entire network,
> and its pools were taken over by its HA sister.
If it's easy to do so, unplug or "ifconfig down" the interface next time this
happens.
> Googling for that warning brings up nothing useful.
>
> Has anyone ever seen a problem like this? How did you locate it?
Should search src.illumos.org, you'll find this:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/inet/ip/ip_arp.c#1449
We appear to be freaking out over another node having our IP. The only caller
with AR_CN_BOGON is after ip_nce_resolve_all() returns AR_BOGON.
I wonder if some other entity had the same IP, and they
fed-back-upon-each-other negatively?
The message you cite should show an IP address with it:
"proxy ARP problem? Node '%s' is using %s on %s",
where the %s-es are MAC-address, IP-address, and interface-name respectively.
You didn't get examples with your digital camera, did you?
Dan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss