This is just a reflection on the earlier name resolution incident. I find it remarkable how much goes into solving a problem, and the corollary, how much impact a simple problem can have. Just my braindump as a relatively novice sysadmin.

Here's the chain of events:
- This morning at 9am, our web server chokes. I see apache is using up MaxClients - After poking around the various daemons and looking at logs, I figure out that everything is running correctly - I somehow narrow it down to the script that pings the OCLC chat availability service waiting for 20+ seconds and finally timing out, *despite* the fact that I thought it was set up with a 2-second timeout (I don't remember how I got it down to that) - I shut that down temporarily and disabled our chat function, which got the server back to normal. - I browsed the service manually, which worked, and tried two different techniques in the PHP (file_get_contents() and curl), both of which failed. - I went to Brooklyn to do some vigilante digitization and have lunch with my boss - I got back to the office, saw nothing had changed, and started digging deeper into the curl request
- I found the name resolution error, which blew my mind
- I tried resolving multiple ways, and failing that, came here

Thanks to all who contributed ideas... amazing how one change to a vendor DNS server can lead to our web server DOS'ing itself. More networking knowledge... must get more networking knowledge...

--
Yitzchak Schaffer
Systems Manager
Touro College Libraries
212.742.8770 ext. 2432
http://www.tourolib.org/

Access Problems? Contact systems.libr...@touro.edu

Reply via email to