Re: [CentOS] Commands failing silently?

2008-03-27 Thread Dan Bongert

Kai Schaetzl wrote:

William L. Maltby wrote on Tue, 25 Mar 2008 16:18:51 -0400:


~ ? Got me on that one.


home dir plus prompt. It looks funny, yes :-)


Yup, that's exactly it -- I had run that command from my homedir instead of 
from /tmp.


--
Dan Bongert [EMAIL PROTECTED]

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Commands failing silently?

2008-03-27 Thread Dan Bongert

mouss wrote:

Dan Bongert wrote:

mouss wrote:

Dan Bongert wrote:

Hello all:

I have a couple CentOS 4 servers (all up-to-date) that are having 
strange command failures. I first noticed this with a perl script 
that uses lots of system calls.


thoth(66) /tmp uname -a
Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 
06:54:55 EDT 2008 i686 i686 i386 GNU/Linux


Nothing in either dmesg or /var/log/messages seems to indicate any 
problems. It also doesn't seem to matter what the command is -- ls 
is the quickest test, but sshd will sometimes to fail to spawn 
children, etc. There aren't a large amount of processes on the 
machine either -- only 122 at the moment.


Has anyone seen this behavior before? Have I been hit with some sort 
of cunning rootkit? This machine shouldn't be publicly accessible; 
it's behind our firewall.


where is /tmp mounted? is this an external disk (usb, ...)? is it an 
nfs mount?


It's a local disk:

thoth(97) /tmp df -h .
FilesystemSize  Used Avail Use% Mounted on
/dev/md4   16G   77M   15G   1% /tmp

Though 'ls' was just an example -- just about any program will fail. 
The 'w'

command will fail too:



maybe check your PATH. try
$ /bin/ls


Ok, here's a heck of a thing. When I run 'ls' using the full path (and also 
when I unalias it -- I have 'ls' aliased to 'ls -F --color'), 'ls' no longer 
fails.


However, my other test case, 'w', still fails.

(and these are all test cases because I noticed a nightly job with a lot of 
system() calls was failing).



--
Dan Bongert [EMAIL PROTECTED]

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Commands failing silently?

2008-03-27 Thread Dan Bongert

Filipe Brandenburger wrote:

Hi,

On Tue, Mar 25, 2008 at 2:21 PM, Dan Bongert [EMAIL PROTECTED] wrote:

 thoth(3) /tmp ls

 thoth(4) /tmp echo $?
 141


141 is SIGPIPE. If the process is killed by a signal, the return code
will be 128+signal number. 141-128=13, and kill -l says: 13) SIGPIPE.

SIGPIPE means that something that ls is writing to is being closed.
That's really strange, and I couldn't find why.

I still think strace would be the best way to trace it. Please try:

# rm -f /tmp/ls-strace.txt; strace -o /tmp/ls-strace.txt -tt -s 1024
-f ls --color=tty

Repeat it until ls doesn't print anything. Then less your
/tmp/ls-strace.txt file, you'll probably have something like +++
killed by SIGPIPE +++ as the last line of it. Then try to figure out
what happened before it got the SIGPIPE. Probably a write to
something, try to figure out to which file descriptor. If you can't do
it, try to post the last few lines of the file here.


I tried it, but as I said before, strace somehow interferes with what's 
going on. I wasn't able to get a program to fail via strace.



Also, can you post the output of this command?
# ls -la /proc/$$/fd/


thoth(265) /tmp ls -la /proc/$$/fd/

thoth(266) /tmp ls -la /proc/$$/fd/
total 5
dr-x--  2 dbongert dbongert  0 Mar 27 10:17 .
dr-xr-xr-x  3 dbongert dbongert  0 Mar 27 10:03 ..
lrwx--  1 dbongert dbongert 64 Mar 27 10:17 0 - /dev/pts/0
lrwx--  1 dbongert dbongert 64 Mar 27 10:17 1 - /dev/pts/0
lrwx--  1 dbongert dbongert 64 Mar 27 10:17 2 - /dev/pts/0
lrwx--  1 dbongert dbongert 64 Mar 27 10:17 255 - /dev/pts/0
lrwx--  1 dbongert dbongert 64 Mar 27 10:17 3 - socket:[4425494]

--
Dan Bongert [EMAIL PROTECTED]

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Commands failing silently?

2008-03-27 Thread Dan Bongert

Dan Bongert wrote:

Filipe Brandenburger wrote:

Hi,

On Tue, Mar 25, 2008 at 2:21 PM, Dan Bongert [EMAIL PROTECTED] wrote:

 thoth(3) /tmp ls

 thoth(4) /tmp echo $?
 141


141 is SIGPIPE. If the process is killed by a signal, the return code
will be 128+signal number. 141-128=13, and kill -l says: 13) SIGPIPE.

SIGPIPE means that something that ls is writing to is being closed.
That's really strange, and I couldn't find why.

I still think strace would be the best way to trace it. Please try:

# rm -f /tmp/ls-strace.txt; strace -o /tmp/ls-strace.txt -tt -s 1024
-f ls --color=tty

Repeat it until ls doesn't print anything. Then less your
/tmp/ls-strace.txt file, you'll probably have something like +++
killed by SIGPIPE +++ as the last line of it. Then try to figure out
what happened before it got the SIGPIPE. Probably a write to
something, try to figure out to which file descriptor. If you can't do
it, try to post the last few lines of the file here.


I tried it, but as I said before, strace somehow interferes with what's 
going on. I wasn't able to get a program to fail via strace.



Also, can you post the output of this command?
# ls -la /proc/$$/fd/


thoth(265) /tmp ls -la /proc/$$/fd/

thoth(266) /tmp ls -la /proc/$$/fd/
total 5
dr-x--  2 dbongert dbongert  0 Mar 27 10:17 .
dr-xr-xr-x  3 dbongert dbongert  0 Mar 27 10:03 ..
lrwx--  1 dbongert dbongert 64 Mar 27 10:17 0 - /dev/pts/0
lrwx--  1 dbongert dbongert 64 Mar 27 10:17 1 - /dev/pts/0
lrwx--  1 dbongert dbongert 64 Mar 27 10:17 2 - /dev/pts/0
lrwx--  1 dbongert dbongert 64 Mar 27 10:17 255 - /dev/pts/0
lrwx--  1 dbongert dbongert 64 Mar 27 10:17 3 - socket:[4425494]



Ok, here I am replying to myself. On a lark, I tried to strace a different 
program, since I couldn't get strace + ls to fail. Here's the end of the 
output from 'strace w':


connect(4, {sa_family=AF_FILE, path=/var/run/nscd/socket}, 110) = 0
poll([{fd=4, events=POLLOUT|POLLERR|POLLHUP, revents=POLLOUT|POLLHUP}], 1, 
5000) = 1
writev(4, [{\2\0\0\0\1\0\0\0\2\0\0\0, 12}, {0\0, 2}], 2) = -1 EPIPE 
(Broken pipe)

--- SIGPIPE (Broken pipe) @ 0 (0) ---
+++ killed by SIGPIPE +++

Looks like a nscd problem, and disabling it seems to fix the problem.

--
Dan Bongert [EMAIL PROTECTED]

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Commands failing silently?

2008-03-25 Thread Dan Bongert

William L. Maltby wrote:

On Mon, 2008-03-24 at 16:19 -0500, Dan Bongert wrote:

mouss wrote:

Dan Bongert wrote:

Hello all:

snip




Though 'ls' was just an example -- just about any program will fail. The 'w'
command will fail too:

thoth(118) /tmp w
   16:06:51 up  5:34,  1 user,  load average: 0.94, 1.46, 2.04
USER TTY  FROM  LOGIN@   IDLE   JCPU   PCPU WHAT
dbongert pts/0copland.ssc.wisc 14:160.00s  0.22s  0.05s w

thoth(119) /tmp w
   16:06:52 up  5:34,  1 user,  load average: 0.94, 1.46, 2.04
USER TTY  FROM  LOGIN@   IDLE   JCPU   PCPU WHAT
dbongert pts/0copland.ssc.wisc 14:160.00s  0.22s  0.05s w

thoth(120) /tmp w

thoth(121) /tmp w



Hmmm... Sure it's failing? Maybe just the output is going somewhere
else? After the command runs, what does echo $? show? Does it even
work? Echo is a bash internal command, so I would expect it to never
fail.


Ok, it's definitely getting an error from somewhere:

thoth(3) /tmp ls

thoth(4) /tmp echo $?
141

Although:

thoth(31) ~ top


thoth(32) ~ echo $?
0


What is your output device? A serial terminal? If so, could be simple
flow control issues. In fact, any serial connection (even a PC emulating
a terminal) could suffer from flow control problems. And they would tend
to be erratic in nature.


I'm usually sshing into the machine, but I've also experienced the problem
on the console.


If you are on a normal console, try running the commands similart to
this (trying to determine if *something* else is receiving output or
not)

your command  /dev/tty

if this works reliably, maybe that's a starting point.


Nope, that fails intermittently as well.


There's a couple kernel guys who frequent this list. Maybe one of them
will have a clue as to what could go wrong. Corrupted libraries and
whatnot.

You might try that rpm -V command earlier against all packages (add a
a IIRC). Maybe some library accessed by the coreutils, but which is
not itself part of coreutils, is corrupt.


Hmmwhen I do a 'rpm -Va', I get lots of at least one of file's
dependencies has changed since prelinking errors. Even if I run prelink
manually, and then do a 'rpm -Va' immediately afterwards.
--
Dan Bongert [EMAIL PROTECTED]

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Commands failing silently?

2008-03-24 Thread Dan Bongert

Hello all:

I have a couple CentOS 4 servers (all up-to-date) that are having strange 
command failures. I first noticed this with a perl script that uses lots of 
system calls.


Basically, sometimes a command just won't run:

thoth(52) /tmp ls

thoth(53) /tmp ls

thoth(54) /tmp ls

thoth(55) /tmp ls
learner  lost+found/

thoth(56) /tmp ls
learner  lost+found/

thoth(57) /tmp ls
learner  lost+found/

thoth(58) /tmp ls
learner  lost+found/

thoth(59) /tmp ls
learner  lost+found/

thoth(60) /tmp ls
learner  lost+found/

thoth(61) /tmp ls
learner  lost+found/

thoth(62) /tmp ls

thoth(63) /tmp ls

thoth(64) /tmp ls

thoth(65) /tmp ls

thoth(66) /tmp uname -a
Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT 
2008 i686 i686 i386 GNU/Linux


Nothing in either dmesg or /var/log/messages seems to indicate any problems. 
It also doesn't seem to matter what the command is -- ls is the quickest 
test, but sshd will sometimes to fail to spawn children, etc. There aren't a 
large amount of processes on the machine either -- only 122 at the moment.


Has anyone seen this behavior before? Have I been hit with some sort of 
cunning rootkit? This machine shouldn't be publicly accessible; it's behind 
our firewall.


Thanks.
--
Dan Bongert [EMAIL PROTECTED]

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Commands failing silently?

2008-03-24 Thread Dan Bongert

Bill Campbell wrote:

On Mon, Mar 24, 2008, Dan Bongert wrote:

Hello all:

I have a couple CentOS 4 servers (all up-to-date) that are having strange 
command failures. I first noticed this with a perl script that uses lots of 
system calls.


Basically, sometimes a command just won't run:

thoth(52) /tmp ls


...

thoth(66) /tmp uname -a
Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT 
2008 i686 i686 i386 GNU/Linux


Nothing in either dmesg or /var/log/messages seems to indicate any 
problems. It also doesn't seem to matter what the command is -- ls is the 
quickest test, but sshd will sometimes to fail to spawn children, etc. 
There aren't a large amount of processes on the machine either -- only 122 
at the moment.


There is a very good chance that the machine has been cracked,
and the system's /bin/ls routine replaced by one hacked to hide
the cracker's programs.  ``rpm -V coreutils procps util-linux''
may well show several critical programs changed.


Everything seems OK there:


thoth(96) /tmp sudo rpm -V coreutils procps util-linux


You can also try running ``strace /bin/ls'' to see what is going on.


Funnily enough, running strace will work just fine. Though, as I said, just
about any command will fail -- 'ls' was just for testing purposes.


Bill
--
INTERNET:   [EMAIL PROTECTED]  Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/  PO Box 820; 6641 E. Mercer Way
FAX:(206) 232-9186  Mercer Island, WA 98040-0820; (206) 236-1676

When I hear a man applauded by the mob I always feel a pang of pity
for him.  All he has to do to be hissed is to live long enough.
-- H.L. Mencken, Minority Report
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


--
Dan Bongert [EMAIL PROTECTED]

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos