On Sun, Feb 19, 2012 at 11:24 AM, Jan Stary <[email protected]> wrote:
> On Feb 19 10:12:03, Philip Guenther wrote:
>> On Sun, Feb 19, 2012 at 6:17 AM, Jan Stary <[email protected]> wrote:
>> > On a recent install of current/i386 on an ALIX (see dmesg below),
>> > processes (such as a simple 'ls') started to magically segfault and die.
>> >
>> > Feb 19 14:43:17 www /bsd: pid 26001 (bogofilter): user write of
4096@0x3d5b000 at 1776 failed: 14
>>
>> 14 == EFAULT.  Those are generated when the kernel tries to write out
>> a process's memory image for a coredump and the indicated range of
>> memory couldn't be faulted in so that it could be written to the
>> filesystem.
>>
>
> Thank you for the explanation.
>
> So, firstly, the kernel decides a proccess needs to be coredumped.
> (That alone is a problem for me - why would that happen?)
> And secondly, the attempt to coredump the process fails. Right?

Yep.


>> > What does this indicate? Is my RAM bad? Is my CF card bad?
>> > Could someone more knowledgeable please explain the above
>> > messages in detail?
>>
>> The inability to fault in memory that the kernel thinks should be
>> there makes me wonder if you're swapping and the device you're
>> swapping to is failing. Your dmesg suggests you might be swapping to
>> your CF card and you (only?) have 128MB of real memory.  When this is
>> happening, what's the output of "swapctl -l"?  If that shows you are
>> indeed into swap, then a failing CF card would be my guess.
>
> Yes, the machine only has 128MB of memory - which I think should be
> enough for what it does: NATing pf, dhcpd and resolver for the
> internal network, and postfix and httpd for my domain (which
> amounts to almost no traffic).

Have you monitored the memory usage to confirm or deny your belief
that it's sufficient?


> It does not have any swap configured. In fact, I try to design
> my systems so that they don't ever need to swap.
>
>  $ swapctl -l
>  swapctl: no swap devices configured
>
> Would you please care to explain further how the swapping
> is related to the coredumping EFAULTs?

It was a hypothesis based on the available evidence.  Your additional
evidence rules it out, so I see no reason to waste our time explaining
it.

At this point, I suggest you gather data about the system and see if
there's a correlation between the data and when this occurs.  Then
make a hypothesis from that, figure out a way to test it, etc.  In
short, use *SCIENCE* on it!


Philip Guenther

Reply via email to