On Sun, Jan 07, 2007 at 11:49:56AM +0000, Robert Watson wrote: > On Sat, 6 Jan 2007, Ceri Davies wrote: > > >>>So far it's happened this morning and yesterday morning. I haven't seen > >>>it before that. I don't know the cause so I can't reproduce it at will, > >>>but the logs don't give any indication. Chances are that it will happen > >>>again tomorrow, but we'll see. > >> > >>Hmm. It looks like you printf *(td->td_proc->p_fd->fd_ofiles) without > >>the array index. Could you repeat that, but with the array index -- > >>i.e., td->td_proc->p_fd->fd_ofiles[uap->fd]? Also, it would probably be > >>useful to print uap->fd. Right now you're printing stdin (index 0), but > >>if the index is non-0, we want a different file. > > > >Very tactfully put :) Sorry about that. > > > >None of the uap->fd's seem to be valid. In the first case, uap->fd is way > >too high for the length of fd_ofiles, which only has 21 elements: > > > >(kgdb) up 8 > >#8 0xc04c470d in fstat (td=0xc2eeb180, uap=0xd610dc74) at > >/usr/src/sys/kern/kern_descrip.c:1075 > >1075 error = kern_fstat(td, uap->fd, &ub); > >(kgdb) p uap->fd > >$1 = 89 > >(kgdb) p *td->td_proc->p_fd->fd_ofiles[uap->fd] > >Cannot access memory at address 0x0 > > > >In the second, uap->fd is nonsense: > > > >(kgdb) up 8 > >#8 0xc04c470d in fstat (td=0xc3109300, uap=0xd617ec74) at > >/usr/src/sys/kern/kern_descrip.c:1075 > >1075 error = kern_fstat(td, uap->fd, &ub); > >(kgdb) p uap->fd > >$1 = -1023449232 > >(kgdb) > > Hmm. So, I reviewed audit_arg_file() closely, and after staring at the > code a lot, couldn't see anything obvious in either the socket or the > vnode/fifo case. I did fix one other bug there, however, which can never > actually be exercised in 7-CURRENT, and is fairly unlikely in 6-STABLE, and > will MFC that in a week.
OK, thanks.
> Could you try printing *td->td_ar? Maybe this will give us a clue as to
> how far it got. In particular, this may be able to more reliably give us
> the file descriptor number, which is audited early in the system call. You
> might find that 'td' is corrupted in many layers of the stack, keep going
> up until you find one where it's good. It may well be that
> td->td_ar->k_ar.ar_arg_fd is correct, and might confirm that uap->fd is
> correct still. We'd like also to know if ARG_SOCKINFO, ARG_VNODE1, or
> ARG_VNODE2 is set in the k_ar.ar_valid_arg field. This may tell us some
> more about the file descriptor even though it appears to have vanished.
*td->td_ar is null (0x0) in both cases...
> I'm quite worried by the fact that the file descriptor seems not to be
> present any more -- this suggests a file descriptor related race of the
> sort that is both quite difficult to figure out and also quite a risk.
> It's strange that it would only trigger with audit, however--perhaps audit
> stretches out the race. Is this an SMP box?
It's certainly looking quite nasty. This system is UP hardware without
options SMP.
> Could you print the entire contents of *td->td_proc->p_fd?
First case:
(kgdb) p *td->td_proc->p_fd
$2 = {fd_ofiles = 0xc3441000, fd_ofileflags = 0xc3441100 "", fd_cdir =
0xc367f110,
fd_rdir = 0xc2ce2bb0, fd_jdir = 0x0, fd_nfiles = 64, fd_map = 0xc3b65970,
fd_lastfile = 20,
fd_freefile = 16, fd_cmask = 63, fd_refcnt = 1, fd_holdcnt = 1, fd_mtx =
{mtx_object = {
lo_class = 0xc06ad4c4, lo_name = 0xc067c0fd "filedesc structure",
lo_type = 0xc067c0fd "filedesc structure", lo_flags = 196608, lo_list =
{tqe_next = 0x0,
tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0},
fd_locked = 0,
fd_wanted = 0, fd_kqlist = {slh_first = 0x0}, fd_holdleaderscount = 0,
fd_holdleaderswakeup = 0}
Second case:
(kgdb) p *td->td_proc->p_fd
$2 = {fd_ofiles = 0xc2d23600, fd_ofileflags = 0xc2d23700 "", fd_cdir =
0xc31b8660,
fd_rdir = 0xc2ce2bb0, fd_jdir = 0x0, fd_nfiles = 64, fd_map = 0xc2e9c1c0,
fd_lastfile = 20,
fd_freefile = 17, fd_cmask = 63, fd_refcnt = 1, fd_holdcnt = 1, fd_mtx =
{mtx_object = {
lo_class = 0xc06ad4c4, lo_name = 0xc067c0fd "filedesc structure",
lo_type = 0xc067c0fd "filedesc structure", lo_flags = 196608, lo_list =
{tqe_next = 0x0,
tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0},
fd_locked = 0,
fd_wanted = 0, fd_kqlist = {slh_first = 0x0}, fd_holdleaderscount = 0,
fd_holdleaderswakeup = 0}
If it's at all useful, I can provide access to this system and the
dumps.
Ceri
--
That must be wonderful! I don't understand it at all.
-- Moliere
pgpT6fmVvPA4c.pgp
Description: PGP signature
