Re: [CentOS] Commands failing silently?
Kai Schaetzl wrote: William L. Maltby wrote on Tue, 25 Mar 2008 16:18:51 -0400: ~ ? Got me on that one. home dir plus prompt. It looks funny, yes :-) Yup, that's exactly it -- I had run that command from my homedir instead of from /tmp. -- Dan Bongert [EMAIL PROTECTED] ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Commands failing silently?
mouss wrote: Dan Bongert wrote: mouss wrote: Dan Bongert wrote: Hello all: I have a couple CentOS 4 servers (all up-to-date) that are having strange command failures. I first noticed this with a perl script that uses lots of system calls. thoth(66) /tmp uname -a Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT 2008 i686 i686 i386 GNU/Linux Nothing in either dmesg or /var/log/messages seems to indicate any problems. It also doesn't seem to matter what the command is -- ls is the quickest test, but sshd will sometimes to fail to spawn children, etc. There aren't a large amount of processes on the machine either -- only 122 at the moment. Has anyone seen this behavior before? Have I been hit with some sort of cunning rootkit? This machine shouldn't be publicly accessible; it's behind our firewall. where is /tmp mounted? is this an external disk (usb, ...)? is it an nfs mount? It's a local disk: thoth(97) /tmp df -h . FilesystemSize Used Avail Use% Mounted on /dev/md4 16G 77M 15G 1% /tmp Though 'ls' was just an example -- just about any program will fail. The 'w' command will fail too: maybe check your PATH. try $ /bin/ls Ok, here's a heck of a thing. When I run 'ls' using the full path (and also when I unalias it -- I have 'ls' aliased to 'ls -F --color'), 'ls' no longer fails. However, my other test case, 'w', still fails. (and these are all test cases because I noticed a nightly job with a lot of system() calls was failing). -- Dan Bongert [EMAIL PROTECTED] ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Commands failing silently?
Filipe Brandenburger wrote: Hi, On Tue, Mar 25, 2008 at 2:21 PM, Dan Bongert [EMAIL PROTECTED] wrote: thoth(3) /tmp ls thoth(4) /tmp echo $? 141 141 is SIGPIPE. If the process is killed by a signal, the return code will be 128+signal number. 141-128=13, and kill -l says: 13) SIGPIPE. SIGPIPE means that something that ls is writing to is being closed. That's really strange, and I couldn't find why. I still think strace would be the best way to trace it. Please try: # rm -f /tmp/ls-strace.txt; strace -o /tmp/ls-strace.txt -tt -s 1024 -f ls --color=tty Repeat it until ls doesn't print anything. Then less your /tmp/ls-strace.txt file, you'll probably have something like +++ killed by SIGPIPE +++ as the last line of it. Then try to figure out what happened before it got the SIGPIPE. Probably a write to something, try to figure out to which file descriptor. If you can't do it, try to post the last few lines of the file here. I tried it, but as I said before, strace somehow interferes with what's going on. I wasn't able to get a program to fail via strace. Also, can you post the output of this command? # ls -la /proc/$$/fd/ thoth(265) /tmp ls -la /proc/$$/fd/ thoth(266) /tmp ls -la /proc/$$/fd/ total 5 dr-x-- 2 dbongert dbongert 0 Mar 27 10:17 . dr-xr-xr-x 3 dbongert dbongert 0 Mar 27 10:03 .. lrwx-- 1 dbongert dbongert 64 Mar 27 10:17 0 - /dev/pts/0 lrwx-- 1 dbongert dbongert 64 Mar 27 10:17 1 - /dev/pts/0 lrwx-- 1 dbongert dbongert 64 Mar 27 10:17 2 - /dev/pts/0 lrwx-- 1 dbongert dbongert 64 Mar 27 10:17 255 - /dev/pts/0 lrwx-- 1 dbongert dbongert 64 Mar 27 10:17 3 - socket:[4425494] -- Dan Bongert [EMAIL PROTECTED] ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Commands failing silently?
Dan Bongert wrote: Filipe Brandenburger wrote: Hi, On Tue, Mar 25, 2008 at 2:21 PM, Dan Bongert [EMAIL PROTECTED] wrote: thoth(3) /tmp ls thoth(4) /tmp echo $? 141 141 is SIGPIPE. If the process is killed by a signal, the return code will be 128+signal number. 141-128=13, and kill -l says: 13) SIGPIPE. SIGPIPE means that something that ls is writing to is being closed. That's really strange, and I couldn't find why. I still think strace would be the best way to trace it. Please try: # rm -f /tmp/ls-strace.txt; strace -o /tmp/ls-strace.txt -tt -s 1024 -f ls --color=tty Repeat it until ls doesn't print anything. Then less your /tmp/ls-strace.txt file, you'll probably have something like +++ killed by SIGPIPE +++ as the last line of it. Then try to figure out what happened before it got the SIGPIPE. Probably a write to something, try to figure out to which file descriptor. If you can't do it, try to post the last few lines of the file here. I tried it, but as I said before, strace somehow interferes with what's going on. I wasn't able to get a program to fail via strace. Also, can you post the output of this command? # ls -la /proc/$$/fd/ thoth(265) /tmp ls -la /proc/$$/fd/ thoth(266) /tmp ls -la /proc/$$/fd/ total 5 dr-x-- 2 dbongert dbongert 0 Mar 27 10:17 . dr-xr-xr-x 3 dbongert dbongert 0 Mar 27 10:03 .. lrwx-- 1 dbongert dbongert 64 Mar 27 10:17 0 - /dev/pts/0 lrwx-- 1 dbongert dbongert 64 Mar 27 10:17 1 - /dev/pts/0 lrwx-- 1 dbongert dbongert 64 Mar 27 10:17 2 - /dev/pts/0 lrwx-- 1 dbongert dbongert 64 Mar 27 10:17 255 - /dev/pts/0 lrwx-- 1 dbongert dbongert 64 Mar 27 10:17 3 - socket:[4425494] Ok, here I am replying to myself. On a lark, I tried to strace a different program, since I couldn't get strace + ls to fail. Here's the end of the output from 'strace w': connect(4, {sa_family=AF_FILE, path=/var/run/nscd/socket}, 110) = 0 poll([{fd=4, events=POLLOUT|POLLERR|POLLHUP, revents=POLLOUT|POLLHUP}], 1, 5000) = 1 writev(4, [{\2\0\0\0\1\0\0\0\2\0\0\0, 12}, {0\0, 2}], 2) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- +++ killed by SIGPIPE +++ Looks like a nscd problem, and disabling it seems to fix the problem. -- Dan Bongert [EMAIL PROTECTED] ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Commands failing silently?
William L. Maltby wrote: On Mon, 2008-03-24 at 16:19 -0500, Dan Bongert wrote: mouss wrote: Dan Bongert wrote: Hello all: snip Though 'ls' was just an example -- just about any program will fail. The 'w' command will fail too: thoth(118) /tmp w 16:06:51 up 5:34, 1 user, load average: 0.94, 1.46, 2.04 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT dbongert pts/0copland.ssc.wisc 14:160.00s 0.22s 0.05s w thoth(119) /tmp w 16:06:52 up 5:34, 1 user, load average: 0.94, 1.46, 2.04 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT dbongert pts/0copland.ssc.wisc 14:160.00s 0.22s 0.05s w thoth(120) /tmp w thoth(121) /tmp w Hmmm... Sure it's failing? Maybe just the output is going somewhere else? After the command runs, what does echo $? show? Does it even work? Echo is a bash internal command, so I would expect it to never fail. Ok, it's definitely getting an error from somewhere: thoth(3) /tmp ls thoth(4) /tmp echo $? 141 Although: thoth(31) ~ top thoth(32) ~ echo $? 0 What is your output device? A serial terminal? If so, could be simple flow control issues. In fact, any serial connection (even a PC emulating a terminal) could suffer from flow control problems. And they would tend to be erratic in nature. I'm usually sshing into the machine, but I've also experienced the problem on the console. If you are on a normal console, try running the commands similart to this (trying to determine if *something* else is receiving output or not) your command /dev/tty if this works reliably, maybe that's a starting point. Nope, that fails intermittently as well. There's a couple kernel guys who frequent this list. Maybe one of them will have a clue as to what could go wrong. Corrupted libraries and whatnot. You might try that rpm -V command earlier against all packages (add a a IIRC). Maybe some library accessed by the coreutils, but which is not itself part of coreutils, is corrupt. Hmmwhen I do a 'rpm -Va', I get lots of at least one of file's dependencies has changed since prelinking errors. Even if I run prelink manually, and then do a 'rpm -Va' immediately afterwards. -- Dan Bongert [EMAIL PROTECTED] ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] Commands failing silently?
Hello all: I have a couple CentOS 4 servers (all up-to-date) that are having strange command failures. I first noticed this with a perl script that uses lots of system calls. Basically, sometimes a command just won't run: thoth(52) /tmp ls thoth(53) /tmp ls thoth(54) /tmp ls thoth(55) /tmp ls learner lost+found/ thoth(56) /tmp ls learner lost+found/ thoth(57) /tmp ls learner lost+found/ thoth(58) /tmp ls learner lost+found/ thoth(59) /tmp ls learner lost+found/ thoth(60) /tmp ls learner lost+found/ thoth(61) /tmp ls learner lost+found/ thoth(62) /tmp ls thoth(63) /tmp ls thoth(64) /tmp ls thoth(65) /tmp ls thoth(66) /tmp uname -a Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT 2008 i686 i686 i386 GNU/Linux Nothing in either dmesg or /var/log/messages seems to indicate any problems. It also doesn't seem to matter what the command is -- ls is the quickest test, but sshd will sometimes to fail to spawn children, etc. There aren't a large amount of processes on the machine either -- only 122 at the moment. Has anyone seen this behavior before? Have I been hit with some sort of cunning rootkit? This machine shouldn't be publicly accessible; it's behind our firewall. Thanks. -- Dan Bongert [EMAIL PROTECTED] ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Commands failing silently?
Bill Campbell wrote: On Mon, Mar 24, 2008, Dan Bongert wrote: Hello all: I have a couple CentOS 4 servers (all up-to-date) that are having strange command failures. I first noticed this with a perl script that uses lots of system calls. Basically, sometimes a command just won't run: thoth(52) /tmp ls ... thoth(66) /tmp uname -a Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT 2008 i686 i686 i386 GNU/Linux Nothing in either dmesg or /var/log/messages seems to indicate any problems. It also doesn't seem to matter what the command is -- ls is the quickest test, but sshd will sometimes to fail to spawn children, etc. There aren't a large amount of processes on the machine either -- only 122 at the moment. There is a very good chance that the machine has been cracked, and the system's /bin/ls routine replaced by one hacked to hide the cracker's programs. ``rpm -V coreutils procps util-linux'' may well show several critical programs changed. Everything seems OK there: thoth(96) /tmp sudo rpm -V coreutils procps util-linux You can also try running ``strace /bin/ls'' to see what is going on. Funnily enough, running strace will work just fine. Though, as I said, just about any command will fail -- 'ls' was just for testing purposes. Bill -- INTERNET: [EMAIL PROTECTED] Bill Campbell; Celestial Software LLC URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way FAX:(206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676 When I hear a man applauded by the mob I always feel a pang of pity for him. All he has to do to be hissed is to live long enough. -- H.L. Mencken, Minority Report ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos -- Dan Bongert [EMAIL PROTECTED] ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos