I'm not aware this kind of defects, seems it should not. but lack of data, we don't know what happened. I suggest you can open a PMR for your issue. Thanks.
Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "CAPIT, NICOLAS" <[email protected]> To: gpfsug main discussion list <[email protected]> Date: 06/27/2017 02:59 PM Subject: Re: [gpfsug-discuss] FS freeze on client nodes with nbCores>workerThreads Sent by: [email protected] Hello, When the node is locked up there is no waiters ("mmdiad --waiters" or "mmfsadm dump waiters"). In the GPFS log file "/var/mmfs/gen/mmfslog" there is nothing and nothing in the dmesg output or system log. The "mmgetstate" command says that the node is "active". The only thing is the freeze of the FS. Best regards, Nicolas Capit ________________________________________ De : [email protected] [[email protected]] de la part de Aaron Knister [[email protected]] Envoyé : mardi 27 juin 2017 01:57 À : [email protected] Objet : Re: [gpfsug-discuss] FS freeze on client nodes with nbCores>workerThreads That's a fascinating bug. When the node is locked up what does "mmdiag --waiters" show from the node in question? I suspect there's more low-level diagnostic data that's helpful for the gurus at IBM but I'm just curious what the waiters look like. -Aaron On 6/26/17 3:49 AM, CAPIT, NICOLAS wrote: > Hello, > > I don't know if this behavior/bug was already reported on this ML, so in > doubt. > > Context: > > - SpectrumScale 4.2.2-3 > - client node with 64 cores > - OS: RHEL7.3 > > When a MPI job with 64 processes is launched on the node with 64 cores > then the FS freezed (only the output log file of the MPI job is put on > the GPFS; so it may be related to the 64 processes writing in a same > file???). > > strace -p 3105 # mmfsd pid stucked > Process 3105 attached > wait4(-1, # stucked at this point > > strace ls /gpfs > stat("/gpfs", {st_mode=S_IFDIR|0755, st_size=131072, ...}) = 0 > openat(AT_FDCWD, "/gpfs", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC > # stucked at this point > > I have no problem with the other nodes of 28 cores. > The GPFS command mmgetstate is working and I am able to use mmshutdown > to recover the node. > > > If I put workerThreads=72 on the 64 core node then I am not able to > reproduce the freeze and I get the right behavior. > > Is this a known bug with a number of cores > workerThreads? > > Best regards, > -- > *Nicolas Capit* > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
