Hello again,
After comparing DRBD 8.3 and 8.4 source code, I see that conditional
TCP_CORK-ing remains to be done in 8.4. Can it be the reason why we experiment
PingAck problems on idle resources ?
PS: our cluster was running DRBD 8.3 beforehands and we had no such problem...
but we were also using Infiniband SDP instead of IPoIB (so we can not know
whether the problem really lies with DRBD).
Thanks for your insights,
Cédric
On 02/02/14 21:29, Cédric Dufour - Idiap Research Institute wrote:
> Hello,
>
> We are experiencing "PingAck timeout" on a system where multiple DRBD
> resources are configured (more exactly a pair of active/active Lustre MDS
> servers):
>
> A --- drbd0 --- B [nfs-data] idle
> A --- drbd1 --- B [nfs-apps] idle
> A --- drbd2 --- B [nfs-tmp] idle
> A --> drbd3 --> B [mdt1] heavy load
> A <-- drbd4 <-- B [mdt2] heavy load
> A --- drbd5 --- B [mgs] idle
>
> Our environment is DRBD 8.4.4, with "ping-int = 10s" and "ping-timeout = 25"
> (2.5s)
>
> The link between the two servers is 20Gb/s Infiniband (configured in datagram
> mode).
>
> Strangely, the timeout occurs on an idle resource (e.g. drbd1) when two of
> the other resources ('mdt1' and 'mdt2') are heavily loaded (and displaying no
> connection/timeout problem what-so-ever).
>
> Looking at the source code, I believe that DRBD cannot know about the
> potentially "congested" link (because of the heavily loaded resources, 'mdt1'
> and 'mdt2') and the potentially resulting PingAck timeout it may spawn for
> another idle one (e.g. 'drbd1'). Am I right ?
>
> Is there a way to circumvent this problem ?
>
> Thanks and best,
>
> Cédric Dufour
> --
>
> Cédric Dufour @ Idiap Research Institute
>
>
>
> _______________________________________________
> drbd-user mailing list
> [email protected]
> http://lists.linbit.com/mailman/listinfo/drbd-user
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user