Pete,
I'm near positive that it's not system resources that are causing
Sniffer to not be able to _access the files_. I believe these errors
are a symptom and not the cause.
You have to keep in mind that on the messages that don't throw errors,
they were taking 30-90 seconds to scan, but immediately after a restart
it was under 1 second. The system stayed the same, it was just the
state of the service that was off in a bad way.
I did add a larger client about a month ago around the time that this
started, which did inch up load by between 1% and 5% I figure, but I
can't say for sure that the two things are connected. I've seen much
bigger changes however in spam volumes from single spammers. I have
looked at my SNFclient.exe.err log and found that the previous slowdowns
were all represented in this file, and nothing else really since a
smattering in 2012 of other stuff. I believe that I/O could be the
trigger, or general system load, but the error in the service that
misses opening some files, and is otherwise slower than normal by 100
times, will persist when everything else is fine again. I figure that
this is all triggered by a short-term lack of resources or a killer
message type of issue that does something like run away with memory.
Certainly there were no recent changes on the server prior to this
starting to happen, including Sniffer itself which has been perfectly
solid up until 5/22.
Regarding the ERROR_MSG_FILE batch that I sent you in that log, it did
happen exactly when I restarted Sniffer, and in fact the
SNFclient.exe.err log showed a different error while this was happening,
and maybe this will point you to something else? That log says "Could
Not Connect!" when the regular Sniffer log shows "ERROR_MSG_FILE" about
1/8th of the time while in a bad state. When I restarted the Sniffer
service, the regular log showed a bunch of "ERROR_MSG_FILE" in a row,
but the SNFclient.exe.err log below shows "XCI Error!: FileError
snf_EngineHandler::scanMessageFile() Open/Seek". You can match the
message ID's with the other log that I provided. I believe that block
of messages was already called to SNFclient.exe, but the Sniffer service
haddn't yet responded, and so they were dumped as a batch into both logs
during shut down of the service.
20130627183807, arg1=F:\\proc\work\D862600e600004269.smd : Could Not
Connect!
20130627183808, arg1=F:\\proc\work\D864401770000431f.smd : Could Not
Connect!
20130627183808, arg1=F:\\proc\work\D861200ce000041ce.smd : Could Not
Connect!
20130627183809, arg1=F:\\proc\work\D8644017300004321.smd : Could Not
Connect!
20130627183809, arg1=F:\\proc\work\D861400da000041e3.smd : Could Not
Connect!
20130627183810, arg1=F:\\proc\work\D862600d70000425f.smd : Could Not
Connect!
20130627183811, arg1=F:\\proc\work\D864a00e900004346.smd : Could Not
Connect!
20130627183811, arg1=F:\\proc\work\D8615019b000041f4.smd : Could Not
Connect!
20130627183813, arg1=F:\\proc\work\D862900e900004282.smd : Could Not
Connect!
20130627183815, arg1=F:\\proc\work\D863d015800004306.smd : Could Not
Connect!
20130627183817, arg1=F:\\proc\work\D860301580000416f.smd : Could Not
Connect!
20130627183818, arg1=F:\\proc\work\D862300e900004255.smd : Could Not
Connect!
20130627183819, arg1=F:\\proc\work\D862900e600004281.smd : Could Not
Connect!
20130627183819, arg1=F:\\proc\work\D864b00d700004357.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D864800d70000433c.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8619017300004205.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D861d017700004230.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8641016d00004310.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865000e600004363.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865000e100004361.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D85fe00e600004152.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8610016d000041c5.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D861f00ce00004231.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8644018b00004320.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D86150173000041ef.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D864100da00004314.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8644009500004329.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8633016d000042c1.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D861200da000041d1.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865700d700004376.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D862c01770000429a.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D86360170000042ce.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D86360183000042d0.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8655019700004373.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D863600e9000042cd.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D86350177000042c6.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D862500ce0000425b.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D864400da00004327.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8644016d00004325.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D864800e10000433d.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D862600e10000426e.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865c00e600004391.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8638018b000042e1.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865d009500004394.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8628017300004273.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8657018000004378.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D86390177000042eb.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865d00d700004396.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D8659015800004388.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D865700e10000437b.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183819, arg1=F:\\proc\work\D863500ce000042c7.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D864a015800004348.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D864800e600004343.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865b01970000438f.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865c016d00004392.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865b00e10000438c.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865b00e90000438b.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D864a017d00004347.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D86380180000042df.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D861901770000420d.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865700950000437a.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D8629017700004280.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D8638016d000042db.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D862b00e600004292.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D86380173000042e0.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D865d018000004393.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D86310177000042b1.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
20130627183820, arg1=F:\\proc\work\D85b100e300003f7f.smd : XCI
Error!: FileError snf_EngineHandler::scanMessageFile() Open/Seek
I am looking to retool presently just because it's time. So if you are
convinced that this is due to low resources, don't concern yourself with it.
Matt
On 6/28/2013 10:36 AM, Pete McNeil wrote:
On 2013-06-27 20:01, Matt wrote:
I'm attaching a snippet of my log. About 100 lines past the start,
where you see a smattering of error messages, you then see a large
block of them while the Sniffer service is restarting, and then after
that no errors at all. There have in fact been no errors at all in
several hours since this restart of Sniffer.
I can promise you that the error you're reporting comes directly from
a problem with the file system. Here is the code where that error is
generated. Put simply the code tries to open the file and determine
it's size. If that doesn't work it throws the ERROR_MSG_FILE exception
in one of two forms -- that's what ends up in the log.
| *try* {/// Try opening the message file.
/|/||/| MessageFile.open(MessageFilePath.c_str(), ios::in |
ios::binary);/// Open the file, binary mode.
/|/||/| MessageFile.seekg(0, ios::end);/// Find the end of the file,
/|/||/| MessageFileSize = MessageFile.tellg();/// read that position as
the size,
/|/||/| MessageFile.seekg(0, ios::beg);/// then go back to the
beginning.
/|/||/| MyScanData.ScanSize = MessageFileSize;/// Capture the message
file size.
/|/||/| }
|| *catch*(...) {/// Trouble? Throw FileError.
/|/||/| MyRulebase->MyLOGmgr.logThisError(/// Log the error.
/|/||/| MyScanData,*"scanMessageFile().open"*,
|| snf_ERROR_MSG_FILE,*"ERROR_MSG_FILE"*
|| );
|| *throw* FileError(*"snf_EngineHandler::scanMessageFile()
Open/Seek"*);
|| }
||
|| *if*(0 >= MessageFileSize) {/// Handle zero length files.
/|/||/| MessageFile.close();/// No need to keep this open.
/|/||/| MyRulebase->MyLOGmgr.logThisError(/// Log the error.
/|/||/| MyScanData,*"scanMessageFile().isFileEmpty?"*,
|| snf_ERROR_MSG_FILE,*"ERROR_MSG_FILE"*
|| );
|| *throw* FileError(*"snf_EngineHandler::scanMessageFile()
FileEmpty!"*);
|| }
|
Another clue is that in the log snippet you provide, there are hints
of a problem brewing when there are sporadic instances of this error.
Then, when there is a large block -- virtually all requests to open
the files for scan are rejected by the OS. Either something made those
files unavailable, or the OS was unable to handle the request. I find
it interesting also that the time required to report the error started
at about 172 milliseconds and continued to climb to 406, 578, and then
656 before the restart.
SNF does not make log entries in the classic log during a restart, by
the way.
Note also the timestamps associated with these events and you can see
that the event was precipitated by a dramatic rise in message rates.
The first part of your log seems to indicate about 7-10 messages per
second. During the large block of errors, the message rate appears to
have been in excess of 120 (I counted approximately 126 at timestamp
20130627183819). That's an increase at least an order of magnitude
higher than the rate that was causing sporadic errors.
I suspect based on the data you have provided that something on your
system generated a very large spike of activity that your IO subsystem
was unable to manage and this caused snf scans to fail because snf was
unable to open the files it was asked to scan.
Your restart of SNF apparently coincided with the event, but since all
of the SMD file names are unique during the event, and since SNF has
no way to generate scan requests on it's own, SNF does not appear to
have been the cause of the event in any way. It was able to record the
event, none the less.
So the question in my mind now is:
* Is there a way to improve your IO subsystem so that it can gain some
headroom above 10 msg/sec?
* What caused the sudden dramatic spike that led to this episode?
On most tiny systems I monitor, scan times are generally < 100 ms. On
your system they are frequently in excess of 400 ms which leads me to
believe your system is a bit underpowered for it's current load.
Hope this helps,
_M
--
Pete McNeil
Chief Scientist
ARM Research Labs, LLC
www.armresearch.com
866-770-1044 x7010
twitter/codedweller
#############################################################
This message is sent to you because you are subscribed to
the mailing list <sniffer@sortmonster.com>.
This list is for discussing Message Sniffer,
Anti-spam, Anti-Malware, and related email topics.
For More information see http://www.armresearch.com
To unsubscribe, E-mail to: <sniffer-...@sortmonster.com>
To switch to the DIGEST mode, E-mail to <sniffer-dig...@sortmonster.com>
To switch to the INDEX mode, E-mail to <sniffer-in...@sortmonster.com>
Send administrative queries to <sniffer-requ...@sortmonster.com>