If it used to work then its probably not config. Most expels are the result of 
network connectivity problems.

If your cluster is not too big try looking at ping from every node to every 
other node and look for large latencies.

Also look to see who is expelling who. Ie - if your RDMA nodes are being 
expelled by non-RDMA nodes. It may point to a weakness in your network which 
GPFS ,being as it is a great finder of weaknesses, is having a problem with.

Also more details (network config etc) will elicit more detailed suggestions.

Cheers,

Vic



> On 1 Jul 2015, at 16:52, Chris Hunter <[email protected]> wrote:
> 
> Hi UG list,
> We have a large rdma/tcp multi-cluster gpfs filesystem, about 2/3 of clients 
> use RDMA. We see a large number of expels of rdma clients but less of the tcp 
> clients.
> Most of the gpfs config is at defaults. We are unclear if any of the non-RDMA 
> config items (eg. Idle socket timeout) would help our issue. Any suggestions 
> on gpfs config parameters we should investigate ?
> 
> thank-you in advance,
> chris hunter
> yale hpc group
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to