Re: [ceph-users] [EXTERNAL] Re: Increase PG number

David Turner Mon, 19 Sep 2016 08:46:34 -0700

We regretably have to increase PG's in a ceph cluster this way more often than 
anyone should ever need to.  As such, we have scripted it out.  A basic version 
of the script that should work for you is below.


First, create a function to check for any pg states that you don't want to 
continue if any pgs are in them (better than duplicating code).  Second, set 
the flags so your cluster doesn't die while you do this.  Third, set your 
numbers of current PGs and the destination PGs for the for loop.  The Loop will 
ignore any number not divisible by 256.  As you've found, increasing by 256 is 
a good number.  More than that and you'll run into issues of your cluster 
curling into a fetal position and crying.  This will loop through increasing 
your pg_num, wait until everything is settled, then increase your pgp_num.  The 
seemingly excessive sleeps are to help the cluster be able to resolve blocked 
requests that will still happen during this.  Lastly unset the flags to let the 
cluster start moving the data around.

One thing to note, in a cluster with 800-1000 HDD OSDS with SSD journals, going 
from 16k to 32k PGs, We set maxbackfills to 1 during busy times and 2 during 
idle times.  maxbackfills of more than 2 is not beneficial for us to increasing 
our pg count.  We have tested maxbackfills of 2 and 5, both took the entire 
weekend to add 4k PGs.  We also do not add all of the PGs at once.  We do 4k 
each weekend and 2k during the week waiting for the cluster to finish each time 
to give our mon stores a chance to compact before we continue.



check_health(){
#If this finds any of the strings in the grep, then it will return 0, otherwise 
it will return 1 (or whatever the grep return code is)
    ceph health | grep 'peering\|stale\|activating\|creating\|down' > /dev/null
    return $?
}

for flag in nobackfill norecover noout nodown
do
    ceph osd set $flag
done

#Set your current and destination pg counts here.
for num in {2048..16384}
do
    [ $(( $i % 256 )) -eq 0 ] || continue
    while sleep 10
    do
        check_health
        if [ $? -ne 0 ]
        then
#This assumes your pool is named rbd
            ceph osd pool set rbd pg_num $num
            break
        fi
    done
    sleep 60
    while sleep 10
    do
        check_health
        if [ $? -ne 0 ]
        then
#This assumes your pool is named rbd
            ceph osd pool set rbd pgp_num $num
            break
        fi
    done
    sleep 60
done

for flag in nobackfill norecover noout nodown
do
    ceph osd unset $flag
done





________________________________

[cid:[email protected]]<https://storagecraft.com>       David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943

________________________________

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

________________________________

________________________________
From: ceph-users [[email protected]] on behalf of Matteo 
Dacrema [[email protected]]
Sent: Monday, September 19, 2016 2:51 AM
To: Will.Boege; [email protected]
Subject: Re: [ceph-users] [EXTERNAL] Re: Increase PG number

Hi,

I’ve 3 different cluster.
The first I’ve been able to upgrade from 1024 to 2048 pgs with 10 minutes of 
"io freeze”.
The second I’ve been able to upgrade from 368 to 512 in a sec without any 
performance issue, but from 512 to 1024 it take over 20 minutes to create pgs.
The third I’ve to upgrade is now 2048 pgs and I’ve to take it to 16384. So what 
I’m wondering is how to do it with minimum performance impact.

Maybe the best way is to upgrade by 256 to 256 pg and pgp num each time letting 
the cluster to rebalance every time.

Thanks
Matteo

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.

Il giorno 19 set 2016, alle ore 05:22, Will.Boege 
<[email protected]<mailto:[email protected]>> ha scritto:

How many PGs do you have - and how many are you increasing it to?

Increasing PG counts can be disruptive if you are increasing by a large 
proportion of the initial count because all the PG peering involved.  If you 
are doubling the amount of PGs it might be good to do it in stages to minimize 
peering.  For example if you are going from 1024 to 2048 - consider 4 increases 
of 256, allowing the cluster to stabilize in-between, rather that one event 
that doubles the number of PGs.

If you expect this cluster to grow, overshoot the recommended PG count by 50% 
or so.  This will allow you to minimize the PG increase events, and thusly 
impact to your users.

From: ceph-users 
<[email protected]<mailto:[email protected]>> 
on behalf of Matteo Dacrema <[email protected]<mailto:[email protected]>>
Date: Sunday, September 18, 2016 at 3:29 PM
To: Goncalo Borges 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [EXTERNAL] Re: [ceph-users] Increase PG number

Hi , thanks for your reply.

Yes, I’don’t any near full osd.

The problem is not the rebalancing process but the process of creation of new 
pgs.

I’ve only 2 host running Ceph Firefly version with 3 SSDs for journaling each.
During the creation of new pgs all the volumes attached stop to read or write 
showing high iowait.
Ceph -s tell me that there are thousand of slow requests.

When all the pgs are created slow request begin to decrease and the cluster 
start rebalancing process.

Matteo

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.

Il giorno 18 set 2016, alle ore 13:08, Goncalo Borges 
<[email protected]<mailto:[email protected]>> ha scritto:

Hi
I am assuming that you do not have any near full osd  (either before or along 
the pg splitting process) and that your cluster is healthy.

To minimize the impact on the clients during recover or operations like pg 
splitting, it is good to set the following configs. Obviously the whole 
operation will take longer to recover but the impact on clients will be 
minimized.

#  ceph daemon mon.rccephmon1 config show | egrep 
"(osd_max_backfills|osd_recovery_threads|osd_recovery_op_priority|osd_client_op_priority|osd_recovery_max_active)"
   "osd_max_backfills": "1",
   "osd_recovery_threads": "1",
   "osd_recovery_max_active": "1"
   "osd_client_op_priority": "63",
   "osd_recovery_op_priority": "1"

Cheers
G.
________________________________________
From: ceph-users 
[[email protected]<mailto:[email protected]>] 
on behalf of Matteo Dacrema [[email protected]<mailto:[email protected]>]
Sent: 18 September 2016 03:42
To: [email protected]<mailto:[email protected]>
Subject: [ceph-users] Increase PG number

Hi All,

I need to expand my ceph cluster and I also need to increase pg number.
In a test environment I see that during pg creation all read and write 
operations are stopped.

Is that a normal behavior ?

Thanks
Matteo
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.


--
Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto.
Seguire il link qui sotto per segnalarlo come spam:
http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=D6CF2401EE.A1426




--
Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto.
Clicca qui per segnalarlo come 
spam.<http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=3FE154026D.A7030>
Clicca qui per metterlo in 
blacklist<http://mx01.enter.it/cgi-bin/learn-msg.cgi?blacklist=1&id=3FE154026D.A7030>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [EXTERNAL] Re: Increase PG number

Reply via email to