> > I have about 40 drbd devices per node (primary and secondaries). Our 
> > provider
> > has lot of network issues, which sometimes cause drbd to 
> > disconnect/reconnect
> > very often : about 500 NetworkFailure in 1 hour before the last crash :
> > # grep "Connected -> NetworkFailure" /var/log/messages|grep -c "Mar 30 00"
> > 483
> 
> So you are using DRBD with ganeti in a cloud?
> Which cloud?
what do you mean by which cloud ? 
> The most interessting line is before that.
> 
> > Mar 30 00:52:48 z2-6 kernel: [1685605.588315] CPU 2 
> 
> > Mar 30 00:52:48 z2-6 kernel: [1685605.589086] Pid: 21781, comm: 
> > drbd0_worker Tainted: G        W  2.6.30-2-amd64 #1 X8STi
> > Mar 30 00:52:48 z2-6 kernel: [1685605.594280] RIP: 
> > 0010:[<ffffffff802bbc80>] [<ffffffff802bbc80>] cache_alloc_refill+0xf6/0x1f9
> 
> Hard out of memory?
> did you google for "2.6.30 cache_alloc_refill",
> and checked that you are not affected by any of those?

Yep, but there is not lot of things. We may suppose that, because of the lot of
NetworkFailure / Reconnection, the system do not flush memory fast enough so
that, when the network/drbd driver asks for memory, it fails, and the driver
deactivates itself (especially if we are in some special context, like IRQ) ?

Maxence
-- 
Maxence DUNNEWIND
Contact : [email protected]
Site : http://www.dunnewind.net
GPG : 18AE 61E4 D0B0 1C7C AAC9  E40D 4D39 68DB 0D2E B533

Attachment: signature.asc
Description: Digital signature

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to