On Dec 17, 2013, at 10:29 AM, Sten Wolf <[email protected]>
 wrote:

I'm afraid I don't have any suggested solutions to your problem, but I did 
notice something about your lnet selftest script.

> lst add_group servers 10.0.0.[22,23]@tcp
> lst add_group readers 10.0.0.[22,23]@tcp
> lst add_group writers 10.0.0.[22,23]@tcp
> lst add_batch bulk_rw
> lst add_test --batch bulk_rw --from readers --to servers \
> brw read check=simple size=1M
> lst add_test --batch bulk_rw --from writers --to servers \
> brw write check=full size=4K

You may want to try swapping the order of the nids in the "servers" group.  If 
I recall correctly, the default distribution method for lnet selftest is 1:1.  
This means that your clients and servers will be paired like this:

10.0.0.22@tcp  <-->  10.0.0.22@tcp
10.0.0.23@tcp  <--> 10.0.0.23@tcp

So you are not testing any lnet traffic between nodes.  (That being said, the 
lnet connectivity between your nodes is still probably fine otherwise the lnet 
selftest would likely not have run at all.)

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu


_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to