Here at LLNL we know of at least one cause of 100% CPU utilization observed on LNET gigE<->elan routers during a heavy read load. We tracked this issue back to a needed tuning for our e1000 network cards to prevent RX descriptor overflow. While we mainly saw this on our router nodes I see no reason it couldn't manifest itself on a heavily loaded tcp attached client or server. The details of this bug and a debug patch which will log the RX descriptor overflows if they are occuring can be found in bug10077.
https://bugzilla.lustre.org/show_bug.cgi?id=10077 At LLNL we now make a habit out of maxing out the RX descriptor queue on any lustre node using an e1000 adaptor. Add the module option 'RxDescriptors=4096' to your modprobe.conf or use 'ethtool -G ethX rx 4096' to tune it on the fly. I have not yet investigated if the tg3 driver or bcm5700 drivers suffer from similar issues. They may. -- Good luck, Brian > On 2006-11-15, at 12:36 , Peter Bojanic wrote: > > Shane, > > > > Regarding the read performance issues you mentioned today... > > Some further news... > > - Sandia (Redstorm) reports 100% CPU utilization during reads on both > Linux and Cray systems; they're running Unicos 1.4, based on Lustre > 1.4.6 > > ===> Lee, is there a Bugzilla reported by Sandia that describes this > issue? > > - Indiana University reports a similar read performance drop-off but > with Lustre 1.4.7.x. I've escalated the following Bugzilla to our IO > team with the highest priority: > > Bug 11194: simultaneous reads of same one stripe file show decrease > in performance > > ===> mjmac, can you please make sure we've got Cluster Surveys for > both Indiana University and for Rizzo at ORNL so engineering can > understand the scope of these systems > > Thanks, > Peter > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
