Hello,

We are experiencing "PingAck timeout" on a system where multiple DRBD resources 
are configured (more exactly a pair of active/active Lustre MDS servers):

A --- drbd0 --- B  [nfs-data] idle
A --- drbd1 --- B  [nfs-apps] idle
A --- drbd2 --- B  [nfs-tmp] idle
A --> drbd3 --> B  [mdt1] heavy load
A <-- drbd4 <-- B  [mdt2] heavy load
A --- drbd5 --- B  [mgs] idle

Our environment is DRBD 8.4.4, with "ping-int = 10s" and "ping-timeout = 25" 
(2.5s)

The link between the two servers is 20Gb/s Infiniband (configured in datagram 
mode).

Strangely, the timeout occurs on an idle resource (e.g. drbd1) when two of the 
other resources ('mdt1' and 'mdt2') are heavily loaded (and displaying no 
connection/timeout problem what-so-ever).

Looking at the source code, I believe that DRBD cannot know about the 
potentially "congested" link (because of the heavily loaded resources, 'mdt1' 
and 'mdt2') and the potentially resulting PingAck timeout it may spawn for 
another idle one (e.g. 'drbd1'). Am I right ?

Is there a way to circumvent this problem ?

Thanks and best,

Cédric Dufour
-- 

Cédric Dufour @ Idiap Research Institute

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user
  • [DRBD-user] "PingAck timeou... Cédric Dufour - Idiap Research Institute

Reply via email to