I was able to uninstall VirusScan in safe mode and the server has not crashed after two hours. Before that it wouldn’t stay up for 2 minutes, without exception. I checked versions on some of the Guardium files and as far as I can tell, the app is at a version prior to the stable one. The app owners will have to determine that.
I feel like I should contribute something, so for anyone who needs to know how to uninstall an app in safe mode, see this: http://www.windowsnetworking.com/kbase/WindowsTips/WindowsXP/AdminTips/Miscellaneous/UninstallapplicationsinSafeMode.html Stopping the McAfee services didn’t seem to be enough to stop its processes from loading so the uninstall was necessary. Thanks very much for the help. *From:* [email protected] [mailto: [email protected]] *On Behalf Of *Charles F Sullivan *Sent:* Friday, December 16, 2016 9:39 AM *To:* [email protected] *Subject:* RE: [NTSysADM] Analyzing Minidumps Thanks much Michael and David. I think there’s a very good chance that you’ve hit the cause. We’re going to check the Guardium version on the crashed server to see if it’s one of the affected versions. We’ll also disable the McAfee services then boot into normal mode to see if the server stays up. I’m going to look at the PS windbg script so that I have something a bit better for troubleshooting issues like this, so thanks for that. This is very helpful. *From:* [email protected] [mailto: [email protected]] *On Behalf Of *Michael B. Smith *Sent:* Thursday, December 15, 2016 6:57 PM *To:* [email protected] *Subject:* RE: [NTSysADM] Analyzing Minidumps Here is something I just came across. I haven’t used it yet, but I certainly intend to: http://www.leeholmes.com/blog/2009/01/21/scripting-windbg-with-powershell/ (I own this book and read it front-to-back when I got it, but I’m getting old enough to where CRS is always there…) *From:* Michael B. Smith *Sent:* Thursday, December 15, 2016 5:53 PM *To:* [email protected] *Subject:* RE: [NTSysADM] Analyzing Minidumps The page I mentioned tells you how to find the object (variable or procedure) that caused the dump. Granted, using windbg. Insofar as broken AV: http://www-01.ibm.com/support/docview.wss?uid=swg21971076 *From:* [email protected] [ mailto:[email protected] <[email protected]>] *On Behalf Of *Charles F Sullivan *Sent:* Thursday, December 15, 2016 5:41 PM *To:* [email protected] *Subject:* RE: [NTSysADM] Analyzing Minidumps Sorry, clfs.sys is correct. I had already gotten the page you reference when I searched REFERENCE_BY_POINTER but I’m not sure it gives me something to look for. What is it that makes you think broken AV? I will try to pursue that. We use VirusScan Enterprise 8.8 and I can look at the logs from it, which I hadn’t thought of trying. Thanks for the help. *From:* [email protected] [mailto: [email protected]] *On Behalf Of *Michael B. Smith *Sent:* Thursday, December 15, 2016 5:10 PM *To:* [email protected] *Subject:* RE: [NTSysADM] Analyzing Minidumps After a few minutes of reading around – sounds like a broken AV to me. You wrote below CLFSYS.SYS – did you mean clfs.sys? Because I don’t think there is a CLFSYS.SYS, which would lead me to think “virus”. This is also a worthwhile read: https://msdn.microsoft.com/en-us/library/windows/hardware/ff557386(v=vs.85).aspx *From:* [email protected] [ mailto:[email protected] <[email protected]>] *On Behalf Of *Charles F Sullivan *Sent:* Thursday, December 15, 2016 3:45 PM *To:* [email protected] *Subject:* [NTSysADM] Analyzing Minidumps This is something I find myself needing to do only occasionally. I usually use BlueScreenView to read minidumps after a crash because it’s quicker and easier than windbg. Regardless of the tool, when multiple crashes are caused by the same device driver, it’s pretty straightforward as to the culprit. In the case of system drivers along with ntoskrnl.exe being listed as the cause, it’s not so apparent, so I’m not really sure what I’m seeing. We had a Windows 2012 R2 VMware VM that suddenly began crashing on a Sunday night when nothing such as a backup or other intense operation was happening. The server kept crashing after the initial time and the only way to stop it was to boot into Safe Mode and we ended up rebuilding the server. Along with ntoskrnl.exe on each crash, each of the 12 minidumps lists one of these as the cause. This is from the first crash and happened only once: Fs_Rec.sys (File System Recognizer Driver) This was listed as the cause on 9 of the subsequent crashes. I believe it’s from the Guardium application, which is used for monitoring of database servers. Nptrc.sys This one twice: CLFSYS.SYS (Common Log File System Driver) The Bug Check String for all of them is REFERENCE_BY_POINTER and the Bug Check Code is 0x00000018. Is this enough information for someone to give an opinion as to what is the likely root cause, or maybe what else to look for? It’s hard for me to blame the Guardium application since it wasn’t shown as the cause on all of the crashes.

