And here's the osd tree if it matters. ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 22.39984 root default -2 21.39984 host 10 0 1.06999 osd.0 up 1.00000 1.00000 1 1.06999 osd.1 up 1.00000 1.00000 2 1.06999 osd.2 up 1.00000 1.00000 3 1.06999 osd.3 up 1.00000 1.00000 4 1.06999 osd.4 up 1.00000 1.00000 5 1.06999 osd.5 up 1.00000 1.00000 6 1.06999 osd.6 up 1.00000 1.00000 7 1.06999 osd.7 up 1.00000 1.00000 8 1.06999 osd.8 up 1.00000 1.00000 9 1.06999 osd.9 up 1.00000 1.00000 10 1.06999 osd.10 up 1.00000 1.00000 11 1.06999 osd.11 up 1.00000 1.00000 12 1.06999 osd.12 up 1.00000 1.00000 13 1.06999 osd.13 up 1.00000 1.00000 14 1.06999 osd.14 up 1.00000 1.00000 15 1.06999 osd.15 up 1.00000 1.00000 16 1.06999 osd.16 up 1.00000 1.00000 17 1.06999 osd.17 up 1.00000 1.00000 18 1.06999 osd.18 up 1.00000 1.00000 19 1.06999 osd.19 up 1.00000 1.00000 -3 1.00000 host 148_96 0 1.00000 osd.0 up 1.00000 1.00000
On Wed, 23 Mar 2016 at 19:10 Zhang Qiang <[email protected]> wrote: > Oliver, Goncalo, > > Sorry to disturb again, but recreating the pool with a smaller pg_num > didn't seem to work, now all 666 pgs are degraded + undersized. > > New status: > cluster d2a69513-ad8e-4b25-8f10-69c4041d624d > health HEALTH_WARN > 666 pgs degraded > 82 pgs stuck unclean > 666 pgs undersized > monmap e5: 5 mons at {1= > 10.3.138.37:6789/0,2=10.3.138.39:6789/0,3=10.3.138.40:6789/0,4=10.3.138.59:6789/0,GGZ-YG-S0311-PLATFORM-138=10.3.138.36:6789/0 > } > election epoch 28, quorum 0,1,2,3,4 > GGZ-YG-S0311-PLATFORM-138,1,2,3,4 > osdmap e705: 20 osds: 20 up, 20 in > pgmap v1961: 666 pgs, 1 pools, 0 bytes data, 0 objects > 13223 MB used, 20861 GB / 21991 GB avail > 666 active+undersized+degraded > > Only one pool and its size is 3. So I think according to the algorithm, > (20 * 100) / 3 = 666 pgs is reasonable. > > I updated health detail and also attached a pg query result on gist( > https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4). > > On Wed, 23 Mar 2016 at 09:01 Dotslash Lu <[email protected]> wrote: > >> Hello Gonçalo, >> >> Thanks for your reminding. I was just setting up the cluster for test, so >> don't worry, I can just remove the pool. And I learnt that since the >> replication number and pool number are related to pg_num, I'll consider >> them carefully before deploying any data. >> >> On Mar 23, 2016, at 6:58 AM, Goncalo Borges <[email protected]> >> wrote: >> >> Hi Zhang... >> >> If I can add some more info, the change of PGs is a heavy operation, and >> as far as i know, you should NEVER decrease PGs. From the notes in pgcalc ( >> http://ceph.com/pgcalc/): >> >> "It's also important to know that the PG count can be increased, but >> NEVER decreased without destroying / recreating the pool. However, >> increasing the PG Count of a pool is one of the most impactful events in a >> Ceph Cluster, and should be avoided for production clusters if possible." >> >> So, in your case, I would consider in adding more OSDs. >> >> Cheers >> Goncalo >> >>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
