Brian, You beat me to the punch. After a few days of trying to figure out the pattern, I found this was only happening when the distributed nodes were trying to do host checks. Further discover revealed that we were using 'fping' to check host reachability which did include a ',' in the output. The "send" shell script I was using at the time passed -d ',' to send_nsca to use as a delimiter.
So while the actual host check was sending only 3 fields to the send_service_check script, the arguments to send_nsca were causing it to be broken into 4 fields so the NSCA daemon assumed it was a service check. Not that it matter which I use, but I switched over to use Ethan's script. I guess when no arguments are passed to send_nsca, it breaks on a tab as a delimiter. Anyway, that part of my migration has been working fine. Glad to see the whole 64-bit business was a red herring in my setup (whew!). Thanks for your help. Mark -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brian A. Seklecki Sent: Saturday, January 19, 2008 12:32 PM To: Frost, Mark {PBG} Cc: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] Problem with some NSCA packets getting corruptedon 64-bit SLES 10 MF: Show us your ocsp_command and ochp_command mappings. Are you calling a piped command from checkcommands.cfg or calling an external shell script? I guarantee you the comma (",") in results is being mapped into a field delimiter, which confuses nscad(8). ~~BAS On Thu, 2008-01-17 at 10:37 -0500, Frost, Mark {PBG} wrote: > I've recently begun an effort to move our Nagios installation to a > distributed architecture from a centralized one. I had previous used > NSCA only for a very few passive checks and it works fine on a 32-bit > Red Hat AS 3 platform (the centralized server). > > In testing on a distributed architecture (which is 64-bit Suse Linux > Enterprise Server (SLES) 10), I seem to have a problem with NSCA. (Note > that all Nagios and NSCA binaries and libraries were recompiled on the > 64-bit platform). > > After I broke out all the checks to have 2 separate distributed nodes > send to a central server, I saw a few messages like this one in the > nagios.log file: > > [1200583727] Warning: Passive check result was received for service '0' > on host 'HOSTXXX', but the service could not be found! > > but only about every 1 out of 10 or maybe 20 results was doing this. > That is, the rest of the results were being correctly shown as "EXTERNAL > COMMAND" and all expected NSCA fields came up correctly (hostname, > service desc, check result, text output). > > I started having the "send_nsca" script from the distbributed nodes log > what they were sending to a file. When I correlate what they're sending > with what the NSCA daemon thinks it's receiving, the client is still > sending the correct 4 fields, but it's as if the NSCA daemon is dropping > the 2nd field (service desc) and replacing it with the check result > field. So ultimately, it thinks the service name is '0'. > > I can't see that this matches a pattern (i.e. always on the same hosts > or same service checks). All I've seen so far is that it happens > whether I run NSCA as --single or --daemon. It also happens even if I > turn off one of the distributed nodes (that is, I can't see it being > volume related). > > I have turned on debugging in the NSCA daemon to see what it thinks it's > getting and it echoes what the nagios.log shows: > > SERVICE CHECK -> Host Name: 'HOSTXXX', Service Description: '0', Return > Code: '0', Output: ' rta=0.140000 ms)' > > Again, maybe only 1 out of 10. Ultimately, this causes the server to > run an active check as it thinks it never got a result from the > distbributed node. > > I'm still trying to dig deeper, but it seems to me that this is > increasingly pointing to some issue with 64-bit SLES. Or perhaps some > variable type in NSCA daemon that's not quite right for 64-bit. It's > hard to tell with its intermittent nature and the fact that I have yet > to discover a pattern. > > Has anyone seen anything like this before? > > Thanks > > Mark > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > > > > > > ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null