You might want to check the gpfs logs on the node cl003. Often the message "Lost connection to file system daemon.” means that the daemon asserted while it was doing something... hence the lost connection. If you are checking the state and seeing it in arbitrating mode immed after the command fails that also makes sense as it’s now re-joining the cluster. If you aren’t watching carefully you can miss these events due to way mmfsd will resume the old mounts, hence you check the node with ‘df’ and see the file system is still mounted, then assume all is well, but in fact mmfsd has died and restarted.
Gordon McPheeters ALCF Storage (630) 252-6430 [email protected]<mailto:[email protected]> On Jan 5, 2017, at 3:38 PM, [email protected]<mailto:[email protected]> wrote: On Thu, 05 Jan 2017 20:44:33 +0000, Bryan Banister said: Looking at this further, the output says the “The following disks of home will be formatted on node cl003:“ however that node is the node in ‘arbitrating’ state, so I don’t see how that would work, The bigger question: If it was in "arbitrating", why was it selected as the node to do the formatting? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
