Hi Gang,

This isn't really a question - more of a report. I'm posting this here in case it comes up again and saves anyone from any extended brain damage.

Earlier today one of our customers on a 5209R server issued a reboot and then reported the system failed to come back online. We got console on the box and found it to be at the login prompt so an investigation began.

At first it appeared that the IP addresses were active on the box, but there was no network communication. ifconfig reported the primary IP and all of the associated aliases, but there was no link at the NIC.

After trouble-shooting the physical connections and eliminating the switch as a problem, we moved on to diagnosing the network connection at the OS level. One curious thing we found was that /etc/udev/rules.d/70-persistent-net.rules had two distinct MAC entries listed for eth0. Two entries, just as you might expect when there are 2 interfaces (typically eth0 and eth1) but both were eth0.

Our typical operation is to then flush out the contents of 70-persistent-net.rules and reboot in order to have udev rebuild. But after doing this, there were NO entries in 70-persistent-net.rules. Since we knew the MAC address on the interface, we added a manual entry and rebooted again. No joy.

When that happened, I began to wonder why the system did not see an eth0. We searched for other possible interface names but didn't see any. I poked my head into the grub configuration, as on a RHEL7 box we typically need to force the ethX naming convention there by specifying the net.ifnames=0 variable. It was missing.

To fix, we edited the kernel line (begins with GRUB_CMDLINE_LINUX) and appended "net.ifnames=0" (no quotes) onto the tail end of that line. After that, the changes were compiled in with the grub2-mkconfig command. (We backed up the existing conf file first, just in case.)

After another reboot, the system came online just like it always did.

It's unknown to us whether the customer or any of the customer's users may have made edits to the grub config, or if something got changed in a recent yum update, or some other fluke.

FWIW, the customer reported the reboot was initiated because httpd had stopped responding. Rather than tackle that, the box got rebooted in a bid to restore service faster than doing an investigation on the initial problem. My guess is the httpd lock-up is related to other reports that have been made here and isn't directly connected to the grub issue.

The only thing that gives me concern, though, is what if those boxes that had Apache lock up on them late last week / early this week are going to do the same thing when they are rebooted. (Obviously this will apply to physical bare-metal installs and not Aventurin{e} virtuals.)

--
Chris Gebhardt
VIRTBIZ Internet Services
Access, Web Hosting, Colocation, Dedicated
www.virtbiz.com | toll-free (866) 4 VIRTBIZ
_______________________________________________
Blueonyx mailing list
Blueonyx@mail.blueonyx.it
http://mail.blueonyx.it/mailman/listinfo/blueonyx

Reply via email to