Pete,

I'm near positive that it's not system resources that are causing Sniffer to not be able to _access the files_. I believe these errors are a symptom and not the cause.

You have to keep in mind that on the messages that don't throw errors, they were taking 30-90 seconds to scan, but immediately after a restart it was under 1 second. The system stayed the same, it was just the state of the service that was off in a bad way.

I did add a larger client about a month ago around the time that this started, which did inch up load by between 1% and 5% I figure, but I can't say for sure that the two things are connected. I've seen much bigger changes however in spam volumes from single spammers. I have looked at my SNFclient.exe.err log and found that the previous slowdowns were all represented in this file, and nothing else really since a smattering in 2012 of other stuff. I believe that I/O could be the trigger, or general system load, but the error in the service that misses opening some files, and is otherwise slower than normal by 100 times, will persist when everything else is fine again. I figure that this is all triggered by a short-term lack of resources or a killer message type of issue that does something like run away with memory. Certainly there were no recent changes on the server prior to this starting to happen, including Sniffer itself which has been perfectly solid up until 5/22.

Regarding the ERROR_MSG_FILE batch that I sent you in that log, it did happen exactly when I restarted Sniffer, and in fact the SNFclient.exe.err log showed a different error while this was happening, and maybe this will point you to something else? That log says "Could Not Connect!" when the regular Sniffer log shows "ERROR_MSG_FILE" about 1/8th of the time while in a bad state. When I restarted the Sniffer service, the regular log showed a bunch of "ERROR_MSG_FILE" in a row, but the SNFclient.exe.err log below shows "XCI Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek". You can match the message ID's with the other log that I provided. I believe that block of messages was already called to SNFclient.exe, but the Sniffer service haddn't yet responded, and so they were dumped as a batch into both logs during shut down of the service.

   20130627183807, arg1=F:\\proc\work\D862600e600004269.smd : Could Not
   Connect!
   20130627183808, arg1=F:\\proc\work\D864401770000431f.smd : Could Not
   Connect!
   20130627183808, arg1=F:\\proc\work\D861200ce000041ce.smd : Could Not
   Connect!
   20130627183809, arg1=F:\\proc\work\D8644017300004321.smd : Could Not
   Connect!
   20130627183809, arg1=F:\\proc\work\D861400da000041e3.smd : Could Not
   Connect!
   20130627183810, arg1=F:\\proc\work\D862600d70000425f.smd : Could Not
   Connect!
   20130627183811, arg1=F:\\proc\work\D864a00e900004346.smd : Could Not
   Connect!
   20130627183811, arg1=F:\\proc\work\D8615019b000041f4.smd : Could Not
   Connect!
   20130627183813, arg1=F:\\proc\work\D862900e900004282.smd : Could Not
   Connect!
   20130627183815, arg1=F:\\proc\work\D863d015800004306.smd : Could Not
   Connect!
   20130627183817, arg1=F:\\proc\work\D860301580000416f.smd : Could Not
   Connect!
   20130627183818, arg1=F:\\proc\work\D862300e900004255.smd : Could Not
   Connect!
   20130627183819, arg1=F:\\proc\work\D862900e600004281.smd : Could Not
   Connect!
   20130627183819, arg1=F:\\proc\work\D864b00d700004357.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D864800d70000433c.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8619017300004205.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D861d017700004230.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8641016d00004310.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D865000e600004363.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D865000e100004361.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D85fe00e600004152.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8610016d000041c5.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D861f00ce00004231.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8644018b00004320.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D86150173000041ef.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D864100da00004314.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8644009500004329.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8633016d000042c1.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D861200da000041d1.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D865700d700004376.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D862c01770000429a.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D86360170000042ce.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D86360183000042d0.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8655019700004373.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D863600e9000042cd.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D86350177000042c6.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D862500ce0000425b.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D864400da00004327.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8644016d00004325.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D864800e10000433d.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D862600e10000426e.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D865c00e600004391.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8638018b000042e1.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D865d009500004394.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8628017300004273.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8657018000004378.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D86390177000042eb.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D865d00d700004396.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D8659015800004388.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D865700e10000437b.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183819, arg1=F:\\proc\work\D863500ce000042c7.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D864a015800004348.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D864800e600004343.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D865b01970000438f.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D865c016d00004392.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D865b00e10000438c.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D865b00e90000438b.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D864a017d00004347.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D86380180000042df.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D861901770000420d.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D865700950000437a.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D8629017700004280.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D8638016d000042db.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D862b00e600004292.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D86380173000042e0.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D865d018000004393.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D86310177000042b1.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
   20130627183820, arg1=F:\\proc\work\D85b100e300003f7f.smd : XCI
   Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek


I am looking to retool presently just because it's time. So if you are convinced that this is due to low resources, don't concern yourself with it.

Matt


On 6/28/2013 10:36 AM, Pete McNeil wrote:
On 2013-06-27 20:01, Matt wrote:
I'm attaching a snippet of my log. About 100 lines past the start, where you see a smattering of error messages, you then see a large block of them while the Sniffer service is restarting, and then after that no errors at all. There have in fact been no errors at all in several hours since this restart of Sniffer.

I can promise you that the error you're reporting comes directly from a problem with the file system. Here is the code where that error is generated. Put simply the code tries to open the file and determine it's size. If that doesn't work it throws the ERROR_MSG_FILE exception in one of two forms -- that's what ends up in the log.

|     *try*  {/// Try opening the message file.
/|/||/|         MessageFile.open(MessageFilePath.c_str(), ios::in | 
ios::binary);/// Open the file, binary mode.
/|/||/|         MessageFile.seekg(0, ios::end);/// Find the end of the file,
/|/||/|         MessageFileSize = MessageFile.tellg();/// read that position as 
the size,
/|/||/|         MessageFile.seekg(0, ios::beg);/// then go back to the 
beginning.
/|/||/|         MyScanData.ScanSize = MessageFileSize;/// Capture the message 
file size.
/|/||/|     }
||     *catch*(...) {/// Trouble? Throw FileError.
/|/||/|         MyRulebase->MyLOGmgr.logThisError(/// Log the error.
/|/||/|           MyScanData,*"scanMessageFile().open"*,
||           snf_ERROR_MSG_FILE,*"ERROR_MSG_FILE"*
||         );
||         *throw*  FileError(*"snf_EngineHandler::scanMessageFile() 
Open/Seek"*);
||     }
||
||     *if*(0 >= MessageFileSize) {/// Handle zero length files.
/|/||/|         MessageFile.close();/// No need to keep this open.
/|/||/|         MyRulebase->MyLOGmgr.logThisError(/// Log the error.
/|/||/|           MyScanData,*"scanMessageFile().isFileEmpty?"*,
||           snf_ERROR_MSG_FILE,*"ERROR_MSG_FILE"*
||         );
||         *throw*  FileError(*"snf_EngineHandler::scanMessageFile() 
FileEmpty!"*);
||     }
|

Another clue is that in the log snippet you provide, there are hints of a problem brewing when there are sporadic instances of this error. Then, when there is a large block -- virtually all requests to open the files for scan are rejected by the OS. Either something made those files unavailable, or the OS was unable to handle the request. I find it interesting also that the time required to report the error started at about 172 milliseconds and continued to climb to 406, 578, and then 656 before the restart.

SNF does not make log entries in the classic log during a restart, by the way.

Note also the timestamps associated with these events and you can see that the event was precipitated by a dramatic rise in message rates. The first part of your log seems to indicate about 7-10 messages per second. During the large block of errors, the message rate appears to have been in excess of 120 (I counted approximately 126 at timestamp 20130627183819). That's an increase at least an order of magnitude higher than the rate that was causing sporadic errors.

I suspect based on the data you have provided that something on your system generated a very large spike of activity that your IO subsystem was unable to manage and this caused snf scans to fail because snf was unable to open the files it was asked to scan.

Your restart of SNF apparently coincided with the event, but since all of the SMD file names are unique during the event, and since SNF has no way to generate scan requests on it's own, SNF does not appear to have been the cause of the event in any way. It was able to record the event, none the less.

So the question in my mind now is:

* Is there a way to improve your IO subsystem so that it can gain some headroom above 10 msg/sec?
* What caused the sudden dramatic spike that led to this episode?

On most tiny systems I monitor, scan times are generally < 100 ms. On your system they are frequently in excess of 400 ms which leads me to believe your system is a bit underpowered for it's current load.

Hope this helps,
_M

--
Pete McNeil
Chief Scientist
ARM Research Labs, LLC
www.armresearch.com
866-770-1044 x7010
twitter/codedweller
#############################################################

This message is sent to you because you are subscribed to

   the mailing list <sniffer@sortmonster.com>.

This list is for discussing Message Sniffer,

Anti-spam, Anti-Malware, and related email topics.

For More information see http://www.armresearch.com

To unsubscribe, E-mail to: <sniffer-...@sortmonster.com>

To switch to the DIGEST mode, E-mail to <sniffer-dig...@sortmonster.com>

To switch to the INDEX mode, E-mail to <sniffer-in...@sortmonster.com>

Send administrative queries to  <sniffer-requ...@sortmonster.com>


Reply via email to