Hello, we do encounter peaks of upto 30% package loss in our Gigabit Network. This is sporadic, say once every hour remaining for some seconds. We cannot specify if it extends into minutes. We do relate this to a very high peak load on the net.
Could it be that lustre 'reconnect' messages or 'lnet_try_match_md()' are correlated to this ? i.e. the mds has problems to match infos between osts and mgs ... What happens inside lustre when it stumbles across famous 'package loss' on the net ? (Any timeout/retry counters ???) Regards Heiko _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
