Re: [gpfsug-discuss] Well, this is the pits...

Buterbaugh, Kevin L Thu, 04 May 2017 09:08:16 -0700

Hi Olaf,

I didn’t touch pitWorkerThreadsPerNode … it was already zero.


I’m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or 4.2.0.3 
and are gradually being upgraded).  What version of GPFS fixes this?  With what 
I’m doing I need the ability to run mmrestripefs.

It seems to me that mmrestripefs could check whether QOS is enabled … granted, 
it would have no way of knowing whether the values used actually are reasonable 
or not … but if QOS is enabled then “trust” it to not overrun the system.

PMR time?  Thanks..

Kevin

On May 4, 2017, at 10:54 AM, Olaf Weiser 
<[email protected]<mailto:[email protected]>> wrote:

HI Kevin,
the number of NSDs is more or less nonsense .. it is just the number of nodes x 
PITWorker  should not exceed to much the #mutex/FS block
did you adjust/tune the PitWorker ? ...

so far as I know.. that the code checks the number of NSDs is already 
considered as a defect and will be fixed / is already fixed ( I stepped into it 
here as well)

ps. QOS is the better approach to address this, but unfortunately.. not 
everyone is using it by default... that's why I suspect , the development 
decide to put in a check/limit here .. which in your case(with QOS)  would'nt 
needed





From:        "Buterbaugh, Kevin L" 
<[email protected]<mailto:[email protected]>>
To:        gpfsug main discussion list 
<[email protected]<mailto:[email protected]>>
Date:        05/04/2017 05:44 PM
Subject:        Re: [gpfsug-discuss] Well, this is the pits...
Sent by:        
[email protected]<mailto:[email protected]>
________________________________



Hi Olaf,

Your explanation mostly makes sense, but...

Failed with 4 nodes … failed with 2 nodes … not gonna try with 1 node.  And 
this filesystem only has 32 disks, which I would imagine is not an especially 
large number compared to what some people reading this e-mail have in their 
filesystems.

I thought that QOS (which I’m using) was what would keep an mmrestripefs from 
overrunning the system … QOS has worked extremely well for us - it’s one of my 
favorite additions to GPFS.

Kevin

On May 4, 2017, at 10:34 AM, Olaf Weiser 
<[email protected]<mailto:[email protected]>> wrote:

no.. it is just in the code, because we have to avoid to run out of mutexs / 
block

reduce the number of nodes -N down to 4  (2nodes is even more safer) ... is the 
easiest way to solve it for now....

I've been told the real root cause will be fixed in one of the next ptfs .. 
within this year ..
this warning messages itself should appear every time.. but unfortunately 
someone coded, that it depends on the number of disks (NSDs).. that's why I 
suspect you did'nt see it before
but the fact , that we have to make sure, not to overrun the system by 
mmrestripe  remains.. to please lower the -N number of nodes to 4 or better 2

(even though we know.. than the mmrestripe will take longer)


From:        "Buterbaugh, Kevin L" 
<[email protected]<mailto:[email protected]>>
To:        gpfsug main discussion list 
<[email protected]<mailto:[email protected]>>
Date:        05/04/2017 05:26 PM
Subject:        [gpfsug-discuss] Well, this is the pits...
Sent by:        
[email protected]<mailto:[email protected]>
________________________________



Hi All,

Another one of those, “I can open a PMR if I need to” type questions…

We are in the process of combining two large GPFS filesystems into one new 
filesystem (for various reasons I won’t get into here).  Therefore, I’m doing a 
lot of mmrestripe’s, mmdeldisk’s, and mmadddisk’s.

Yesterday I did an “mmrestripefs <old fs> -r -N <my 8 NSD servers>” (after 
suspending a disk, of course).  Worked like it should.

Today I did a “mmrestripefs <new fs> -b -P capacity -N <those same 8 NSD 
servers>” and got:

mmrestripefs: The total number of PIT worker threads of all participating nodes 
has been exceeded to safely restripe the file system.  The total number of PIT 
worker threads, which is the sum of pitWorkerThreadsPerNode of the 
participating nodes, cannot exceed 31.  Reissue the command with a smaller set 
of participating nodes (-N option) and/or lower the pitWorkerThreadsPerNode 
configure setting.  By default the file system manager node is counted as a 
participating node.
mmrestripefs: Command failed. Examine previous error messages to determine 
cause.

So there must be some difference in how the “-r” and “-b” options calculate the 
number of PIT worker threads.  I did an “mmfsadm dump all | grep 
pitWorkerThreadsPerNode” on all 8 NSD servers and the filesystem manager node … 
they all say the same thing:

  pitWorkerThreadsPerNode 0

Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!?  I’m confused...

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
[email protected]<mailto:[email protected]>- 
(615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Well, this is the pits...

Reply via email to