On Dec 17, 2013, at 10:29 AM, Sten Wolf <[email protected]> wrote:
I'm afraid I don't have any suggested solutions to your problem, but I did notice something about your lnet selftest script. > lst add_group servers 10.0.0.[22,23]@tcp > lst add_group readers 10.0.0.[22,23]@tcp > lst add_group writers 10.0.0.[22,23]@tcp > lst add_batch bulk_rw > lst add_test --batch bulk_rw --from readers --to servers \ > brw read check=simple size=1M > lst add_test --batch bulk_rw --from writers --to servers \ > brw write check=full size=4K You may want to try swapping the order of the nids in the "servers" group. If I recall correctly, the default distribution method for lnet selftest is 1:1. This means that your clients and servers will be paired like this: 10.0.0.22@tcp <--> 10.0.0.22@tcp 10.0.0.23@tcp <--> 10.0.0.23@tcp So you are not testing any lnet traffic between nodes. (That being said, the lnet connectivity between your nodes is still probably fine otherwise the lnet selftest would likely not have run at all.) -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
