Nicolas, By chance do you have a skylake or kabylake based CPU?
Sent from my iPhone > On Jun 30, 2017, at 02:57, IBM Spectrum Scale <[email protected]> wrote: > > I'm not aware this kind of defects, seems it should not. but lack of data, we > don't know what happened. I suggest you can open a PMR for your issue. Thanks. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in other > countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > <graycol.gif>"CAPIT, NICOLAS" ---06/27/2017 02:59:59 PM---Hello, When the > node is locked up there is no waiters ("mmdiad --waiters" or "mmfsadm dump > waiters") > > From: "CAPIT, NICOLAS" <[email protected]> > To: gpfsug main discussion list <[email protected]> > Date: 06/27/2017 02:59 PM > Subject: Re: [gpfsug-discuss] FS freeze on client nodes with > nbCores>workerThreads > Sent by: [email protected] > > > > > Hello, > > When the node is locked up there is no waiters ("mmdiad --waiters" or > "mmfsadm dump waiters"). > In the GPFS log file "/var/mmfs/gen/mmfslog" there is nothing and nothing in > the dmesg output or system log. > The "mmgetstate" command says that the node is "active". > The only thing is the freeze of the FS. > > Best regards, > Nicolas Capit > ________________________________________ > De : [email protected] > [[email protected]] de la part de Aaron Knister > [[email protected]] > Envoyé : mardi 27 juin 2017 01:57 > À : [email protected] > Objet : Re: [gpfsug-discuss] FS freeze on client nodes with > nbCores>workerThreads > > That's a fascinating bug. When the node is locked up what does "mmdiag > --waiters" show from the node in question? I suspect there's more > low-level diagnostic data that's helpful for the gurus at IBM but I'm > just curious what the waiters look like. > > -Aaron > > On 6/26/17 3:49 AM, CAPIT, NICOLAS wrote: > > Hello, > > > > I don't know if this behavior/bug was already reported on this ML, so in > > doubt. > > > > Context: > > > > - SpectrumScale 4.2.2-3 > > - client node with 64 cores > > - OS: RHEL7.3 > > > > When a MPI job with 64 processes is launched on the node with 64 cores > > then the FS freezed (only the output log file of the MPI job is put on > > the GPFS; so it may be related to the 64 processes writing in a same > > file???). > > > > strace -p 3105 # mmfsd pid stucked > > Process 3105 attached > > wait4(-1, # stucked at this point > > > > strace ls /gpfs > > stat("/gpfs", {st_mode=S_IFDIR|0755, st_size=131072, ...}) = 0 > > openat(AT_FDCWD, "/gpfs", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC > > # stucked at this point > > > > I have no problem with the other nodes of 28 cores. > > The GPFS command mmgetstate is working and I am able to use mmshutdown > > to recover the node. > > > > > > If I put workerThreads=72 on the 64 core node then I am not able to > > reproduce the freeze and I get the right behavior. > > > > Is this a known bug with a number of cores > workerThreads? > > > > Best regards, > > -- > > *Nicolas Capit* > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
