You might want to check the gpfs logs on the node cl003.  Often the message 
"Lost connection to file system daemon.” means that the daemon asserted while 
it was doing something... hence the lost connection.
If you are checking the state and seeing it in arbitrating mode immed after the 
command fails that also makes sense as it’s now re-joining the cluster.
If you aren’t watching carefully you can miss these events due to way mmfsd 
will resume the old mounts, hence you check the node with ‘df’ and see the file 
system is still mounted, then assume all is well, but in fact mmfsd has died 
and restarted.


Gordon McPheeters
ALCF Storage
(630) 252-6430
[email protected]<mailto:[email protected]>



On Jan 5, 2017, at 3:38 PM, 
[email protected]<mailto:[email protected]> wrote:

On Thu, 05 Jan 2017 20:44:33 +0000, Bryan Banister said:

Looking at this further, the output says the “The following disks of home
will be formatted on node cl003:“ however that node is the node in
‘arbitrating’ state, so I don’t see how that would work,

The bigger question:  If it was in "arbitrating", why was it selected as
the node to do the formatting?
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to