On Sun, Feb 19, 2012 at 11:24 AM, Jan Stary <[email protected]> wrote: > On Feb 19 10:12:03, Philip Guenther wrote: >> On Sun, Feb 19, 2012 at 6:17 AM, Jan Stary <[email protected]> wrote: >> > On a recent install of current/i386 on an ALIX (see dmesg below), >> > processes (such as a simple 'ls') started to magically segfault and die. >> > >> > Feb 19 14:43:17 www /bsd: pid 26001 (bogofilter): user write of 4096@0x3d5b000 at 1776 failed: 14 >> >> 14 == EFAULT. Those are generated when the kernel tries to write out >> a process's memory image for a coredump and the indicated range of >> memory couldn't be faulted in so that it could be written to the >> filesystem. >> > > Thank you for the explanation. > > So, firstly, the kernel decides a proccess needs to be coredumped. > (That alone is a problem for me - why would that happen?) > And secondly, the attempt to coredump the process fails. Right?
Yep. >> > What does this indicate? Is my RAM bad? Is my CF card bad? >> > Could someone more knowledgeable please explain the above >> > messages in detail? >> >> The inability to fault in memory that the kernel thinks should be >> there makes me wonder if you're swapping and the device you're >> swapping to is failing. Your dmesg suggests you might be swapping to >> your CF card and you (only?) have 128MB of real memory. When this is >> happening, what's the output of "swapctl -l"? If that shows you are >> indeed into swap, then a failing CF card would be my guess. > > Yes, the machine only has 128MB of memory - which I think should be > enough for what it does: NATing pf, dhcpd and resolver for the > internal network, and postfix and httpd for my domain (which > amounts to almost no traffic). Have you monitored the memory usage to confirm or deny your belief that it's sufficient? > It does not have any swap configured. In fact, I try to design > my systems so that they don't ever need to swap. > > $ swapctl -l > swapctl: no swap devices configured > > Would you please care to explain further how the swapping > is related to the coredumping EFAULTs? It was a hypothesis based on the available evidence. Your additional evidence rules it out, so I see no reason to waste our time explaining it. At this point, I suggest you gather data about the system and see if there's a correlation between the data and when this occurs. Then make a hypothesis from that, figure out a way to test it, etc. In short, use *SCIENCE* on it! Philip Guenther

