Mark Watts wrote:
On Thu, 2009-09-17 at 15:29 -0700, Dave Buechler wrote:
Memo to Self: Self, try Reply-All if you want more than one person to look at your problem. ;-)

Original message:


Ok.  I have six hours of log data, but it all says pretty much the same
thing.  Here's the first 15 minutes or so.  If you want more, I'll send
more.

Thanks for looking at this.


Looks like you ran out of ram (and swap) and the Out Of Memory (OOM)
killer started taking a hatchet to your system.

I wouldn't have thought DRBD itself was responsible for this; DRBD says
its sync'd OK its unlikely that there would be any outstanding writes
being buffered, so I suspect something else was eating all your ram.

Mark.

That's a fairly accurate description, considering there wasn't much left of it when I finally got to the console. ;-)

I'm not entirely certain HOW the system ran out of memory, considering it has 2 GB of RAM, and was the secondary node... meaning except for DRBD, the system is supposed to be idle. It looks like I'm going to have to put this server back into Production and keep a close eye on it.

One of the things that concerned me was that it appeared to lose network connectivity before it crashed. Is it possible that I'm looking at a hardware fault?

--
Regards,
David A. Buechler
System Administrator,
CableAmerica Missouri




/var/log/messages:
Sep 15 04:17:38 smtp2 heartbeat: [2717]: WARN: Gmain_timeout_dispatch:
Dispatch function for send local status took too long to execute: 150 ms
(> 50 ms) (GSource: 0x1aacc9f8)
Sep 15 04:19:38 smtp2 snmpd[2489]: Connection from UDP:
[206.16.46.12]:46136
Sep 15 04:19:38 smtp2 snmpd[2489]: Received SNMP packet(s) from UDP:
[206.16.46.12]:46136
Sep 15 04:19:42 smtp2 snmpd[2489]: Connection from UDP:
[206.16.46.12]:46136
Sep 15 04:19:49 smtp2 last message repeated 35 times
Sep 15 04:31:41 smtp2 kernel: block drbd0: sock was shut down by peer
Sep 15 04:31:42 smtp2 kernel: block drbd0: peer( Primary -> Unknown )
conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Sep 15 04:31:42 smtp2 kernel: block drbd0: short read expecting header
on sock: r=0
Sep 15 04:31:42 smtp2 kernel: block drbd0: md_sync_timer expired! Worker
calls drbd_md_sync().
Sep 15 04:31:42 smtp2 kernel: block drbd0: msock_sendmsg returned -32
Sep 15 04:31:42 smtp2 kernel: block drbd0: short sent PingAck size=8 sent=0
Sep 15 04:31:42 smtp2 kernel: block drbd0: asender terminated
Sep 15 04:31:42 smtp2 kernel: block drbd0: Terminating asender thread
Sep 15 04:31:42 smtp2 kernel: block drbd0: Connection closed
Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( BrokenPipe ->
Unconnected )
Sep 15 04:31:42 smtp2 kernel: block drbd0: receiver terminated
Sep 15 04:31:42 smtp2 kernel: block drbd0: Restarting receiver thread
Sep 15 04:31:42 smtp2 kernel: block drbd0: receiver (re)started
Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Sep 15 04:31:42 smtp2 kernel: block drbd0: Handshake successful: Agreed
network protocol version 90
Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Sep 15 04:31:42 smtp2 kernel: block drbd0: Starting asender thread (from
drbd0_receiver [2581])
Sep 15 04:31:42 smtp2 kernel: block drbd0: data-integrity-alg: <not-used>
Sep 15 04:31:42 smtp2 kernel: block drbd0: drbd_sync_handshake:
Sep 15 04:31:42 smtp2 kernel: block drbd0: self
E6B7DB61310BE3A6:0000000000000000:B939FB24AD00D552:7E8DF909A8E37EDF
bits:0 flags:0
Sep 15 04:31:42 smtp2 kernel: block drbd0: peer
E720051EDAD32591:E6B7DB61310BE3A7:B939FB24AD00D552:7E8DF909A8E37EDF
bits:1090 flags:0
Sep 15 04:31:42 smtp2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Sep 15 04:31:42 smtp2 kernel: block drbd0: peer( Unknown -> Primary )
conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID )
Sep 15 04:31:42 smtp2 kernel: block drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0
Sep 15 04:31:42 smtp2 kernel: block drbd0: md_sync_timer expired! Worker
calls drbd_md_sync().
Sep 15 04:31:42 smtp2 kernel: block drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0 exit code 0 (0x0)
Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( WFSyncUUID ->
SyncTarget ) disk( UpToDate -> Inconsistent )
Sep 15 04:31:42 smtp2 kernel: block drbd0: Began resync as SyncTarget
(will sync 4360 KB [1090 bits set]).
Sep 15 04:31:42 smtp2 kernel: block drbd0: Resync done (total 3 sec;
paused 0 sec; 1452 K/sec)
Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( SyncTarget -> Connected
) disk( Inconsistent -> UpToDate )
Sep 15 04:31:42 smtp2 kernel: block drbd0: helper command: /sbin/drbdadm
after-resync-target minor-0
Sep 15 04:31:42 smtp2 kernel: drbdadm invoked oom-killer:
gfp_mask=0x84d0, order=0, oomkilladj=0
Sep 15 04:31:42 smtp2 kernel:
Sep 15 04:31:42 smtp2 kernel: Call Trace:
Sep 15 04:31:42 smtp2 kernel:  [<ffffffff800c3bda>] out_of_memory+0x8e/0x2f3
Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8000f2ea>]
__alloc_pages+0x245/0x2ce
Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8002309f>]
alloc_page_interleave+0x3d/0x74
Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8002b5e5>]
get_zeroed_page+0x21/0x82
Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8002d80c>] __pmd_alloc+0x14/0x8c
Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80008127>]
copy_page_range+0x1f3/0x73e
Sep 15 04:31:42 smtp2 kernel:  [<ffffffff800d6f0e>]
alternate_node_alloc+0x70/0x8c
Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8001f847>]
copy_process+0xd30/0x15b8
Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80030d3a>] do_fork+0x69/0x1c1
Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8005d427>]
ptregscall_common+0x67/0xac
Sep 15 04:31:42 smtp2 kernel:
Sep 15 04:31:42 smtp2 kernel: Mem-info:
Sep 15 04:31:42 smtp2 kernel: Node 0 DMA per-cpu:
Sep 15 04:31:42 smtp2 kernel: cpu 0 hot: high 0, batch 1 used:0
Sep 15 04:31:42 smtp2 kernel: cpu 0 cold: high 0, batch 1 used:0
Sep 15 04:31:42 smtp2 kernel: cpu 1 hot: high 0, batch 1 used:0
Sep 15 04:31:42 smtp2 kernel: cpu 1 cold: high 0, batch 1 used:0
Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32 per-cpu:
Sep 15 04:31:42 smtp2 kernel: cpu 0 hot: high 186, batch 31 used:48
Sep 15 04:31:42 smtp2 kernel: cpu 0 cold: high 62, batch 15 used:40
Sep 15 04:31:42 smtp2 kernel: cpu 1 hot: high 186, batch 31 used:156
Sep 15 04:31:42 smtp2 kernel: cpu 1 cold: high 62, batch 15 used:60
Sep 15 04:31:42 smtp2 kernel: Node 0 Normal per-cpu: empty
Sep 15 04:31:42 smtp2 kernel: Node 0 HighMem per-cpu: empty
Sep 15 04:31:42 smtp2 kernel: Free pages:        7860kB (0kB HighMem)
Sep 15 04:31:42 smtp2 kernel: Active:222971 inactive:131062 dirty:0
writeback:0 unstable:0 free:1965 slab:32477 mapped-file:2705
mapped-anon:353176 pagetables:97307
Sep 15 04:31:42 smtp2 kernel: Node 0 DMA free:2152kB min:28kB low:32kB
high:40kB active:0kB inactive:0kB present:10716kB pages_scanned:0
all_unreclaimable? yes
Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 2003 2003 2003
Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32 free:5708kB min:5708kB
low:7132kB high:8560kB active:891884kB inactive:524248kB
present:2051184kB pages_scanned:2988074 all_unreclaimable? yes
Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
Sep 15 04:31:42 smtp2 kernel: Node 0 Normal free:0kB min:0kB low:0kB
high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0
all_unreclaimable? no
Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
Sep 15 04:31:42 smtp2 kernel: Node 0 HighMem free:0kB min:128kB
low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0
all_unreclaimable? no
Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
Sep 15 04:31:42 smtp2 kernel: Node 0 DMA: 2*4kB 4*8kB 2*16kB 5*32kB
4*64kB 1*128kB 2*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2152kB
Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32: 129*4kB 1*8kB 2*16kB 13*32kB
4*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 5708kB
Sep 15 04:31:42 smtp2 kernel: Node 0 Normal: empty
Sep 15 04:31:42 smtp2 kernel: Node 0 HighMem: empty
Sep 15 04:31:42 smtp2 kernel: 6436 pagecache pages
Sep 15 04:31:42 smtp2 kernel: Swap cache: add 613470, delete 609770,
find 99957/110522, race 3+8
Sep 15 04:31:42 smtp2 kernel: Free swap  = 0kB
Sep 15 04:31:42 smtp2 kernel: Total swap = 2096472kB
Sep 15 04:31:42 smtp2 kernel: Free swap:            0kB
Sep 15 04:31:42 smtp2 kernel: 524000 pages of RAM
Sep 15 04:31:42 smtp2 kernel: 9347 reserved pages
Sep 15 04:31:42 smtp2 kernel: 1057738 pages shared
Sep 15 04:31:42 smtp2 kernel: 3700 pages swap cached
Sep 15 04:31:42 smtp2 kernel: Out of memory: Killed process 3250 (crond).

<Cut for length - DAB>


Andrea Dell'Amico wrote:
On Wed, 2009-09-16 at 03:23 -0700, David Buechler wrote:
Hi Andrea!

Heh, yeah, I should have specified that. No, I'm not using Xen on this particular system. I'm using Xen on other systems under DRBD 8.2 and they're pretty stable.
Ugh. A new kind of crash, not reassuring :-).
I think you should post the kernel Oops on the list, then.

--
Regards,
David A. Buechler
System Administrator,
CableAmerica Missouri
ciao
andrea






--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to