I spotted a section in the pg query about firty objects so i've looked into
that. The ceph documentation is very light on this, but I found the osd and
pg repair commands. I issued the osd repair command and I have now reduced
the number of unclean pgs. This is the output of ceph -s now
cluster 5400bbc9-378d-4c69-afc4-da71393f7baf
health HEALTH_WARN
66 pgs peering
2 pgs repair
66 pgs stuck inactive
66 pgs stuck unclean
4 requests are blocked > 32 sec
pool images pg_num 256 > pgp_num 128
monmap e2: 2 mons at {0=192.168.2.1:6789/0,1=192.168.2.3:6789/0}
election epoch 16, quorum 0,1 0,1
osdmap e214793: 9 osds: 9 up, 9 in; 4 remapped pgs
pgmap v1400150: 256 pgs, 1 pools, 4377 GB data, 1105 kobjects
8801 GB used, 15361 GB / 24162 GB avail
188 active+clean
62 peering
4 remapped+peering
2 active+clean+scrubbing+deep+repair
Pete
On 15 November 2015 at 18:04, Peter Theobald <[email protected]> wrote:
> I still have the pgs stuck peering. I ran ceph pg n.nn query on a few of
> the pgs that are stuck. The ones that are just peering have a few entries
> in recovery_state -> past_intervals (Example at end of message) and the
> ones that say remapped+peering have a long entry here. I don't know what
> the content of pg query is but I have a ffeling that I have had writes to
> different nodes and that has messed up a few objects. I have a lot of
> network traffic between the nodes, a few hundred Mbps which would fit with
> osds trying to work out their state (9 disks with a random IO pattern would
> fit with the level of bandwidth i'm seeing).
>
> This is the full output of ceph health detail
>
> sudo ceph health detail
> HEALTH_WARN 82 pgs peering; 82 pgs stuck inactive; 82 pgs stuck unclean; 1
> requests are blocked > 32 sec; 1 osds have slow requests; pool images
> pg_num 256 > pgp_num 128
> pg 3.21 is stuck inactive for 115937.161742, current state peering, last
> acting [7,5]
> pg 3.80 is stuck inactive for 115913.708453, current state peering, last
> acting [8,6]
> pg 3.23 is stuck inactive for 156640.618069, current state peering, last
> acting [8,3]
> pg 3.82 is stuck inactive for 115931.967078, current state peering, last
> acting [1,5]
> pg 3.e1 is stuck inactive for 116121.694227, current state peering, last
> acting [0,6]
> pg 3.1c is stuck inactive for 115916.431120, current state peering, last
> acting [8,3]
> pg 3.7e is stuck inactive for 115918.390949, current state peering, last
> acting [0,3]
> pg 3.18 is stuck inactive for 115908.250832, current state peering, last
> acting [8,6]
> pg 3.79 is stuck inactive for 115914.617676, current state peering, last
> acting [8,3]
> pg 3.d8 is stuck inactive for 116341.813279, current state peering, last
> acting [2,6]
> pg 3.1b is stuck inactive for 115905.061074, current state peering, last
> acting [7,4]
> pg 3.d9 is stuck inactive for 156650.199216, current state peering, last
> acting [8,3]
> pg 3.db is stuck inactive for 115915.924073, current state peering, last
> acting [1,5]
> pg 3.d4 is stuck inactive for 115918.396086, current state peering, last
> acting [0,3]
> pg 3.17 is stuck inactive for 115915.304764, current state peering, last
> acting [0,3]
> pg 3.70 is stuck inactive for 115915.000395, current state peering, last
> acting [7,6]
> pg 3.12 is stuck inactive for 115916.466955, current state peering, last
> acting [8,3]
> pg 3.13 is stuck inactive for 244912.512309, current state
> remapped+peering, last acting [6,0]
> pg 3.d2 is stuck inactive for 115913.708294, current state peering, last
> acting [8,3]
> pg 3.6d is stuck inactive for 115909.860193, current state peering, last
> acting [8,4]
> pg 3.6e is stuck inactive for 115914.617561, current state peering, last
> acting [8,3]
> pg 3.9 is stuck inactive for 244908.745661, current state
> remapped+peering, last acting [4,2]
> pg 3.68 is stuck inactive for 115916.701060, current state peering, last
> acting [7,3]
> pg 3.6a is stuck inactive for 115914.617589, current state peering, last
> acting [8,3]
> pg 3.4 is stuck inactive for 115913.708054, current state peering, last
> acting [8,3]
> pg 3.ca is stuck inactive for 115915.923728, current state peering, last
> acting [0,6]
> pg 3.64 is stuck inactive for 115905.061782, current state peering, last
> acting [7,4]
> pg 3.6 is stuck inactive for 115913.708077, current state peering, last
> acting [8,3]
> pg 3.0 is stuck inactive for 116106.189550, current state peering, last
> acting [8,6]
> pg 3.c6 is stuck inactive for 115905.061588, current state peering, last
> acting [7,4]
> pg 3.2 is stuck inactive for 116351.261968, current state peering, last
> acting [1,5]
> pg 3.61 is stuck inactive for 115913.854102, current state peering, last
> acting [0,6]
> pg 3.c0 is stuck inactive for 115916.700785, current state peering, last
> acting [7,3]
> pg 3.c2 is stuck inactive for 115913.708368, current state peering, last
> acting [8,6]
> pg 3.bd is stuck inactive for 115909.142185, current state peering, last
> acting [0,4]
> pg 3.58 is stuck inactive for 116290.453805, current state peering, last
> acting [2,6]
> pg 3.59 is stuck inactive for 156592.727428, current state peering, last
> acting [8,3]
> pg 3.5b is stuck inactive for 115915.927480, current state peering, last
> acting [1,5]
> pg 3.54 is stuck inactive for 115918.391135, current state peering, last
> acting [0,3]
> pg 3.bb is stuck inactive for 115918.138327, current state peering, last
> acting [0,3]
> pg 3.b5 is stuck inactive for 156609.811401, current state peering, last
> acting [7,3]
> pg 3.52 is stuck inactive for 115914.617727, current state peering, last
> acting [8,3]
> pg 3.b1 is stuck inactive for 115910.407513, current state peering, last
> acting [1,4]
> pg 3.b3 is stuck inactive for 116204.050176, current state peering, last
> acting [0,6]
> pg 3.af is stuck inactive for 115908.304844, current state peering, last
> acting [1,6]
> pg 3.a8 is stuck inactive for 115909.753895, current state peering, last
> acting [8,5]
> pg 3.4a is stuck inactive for 115913.854219, current state peering, last
> acting [0,6]
> pg 3.a9 is stuck inactive for 115905.061347, current state peering, last
> acting [7,4]
> pg 3.a4 is stuck inactive for 115909.753923, current state peering, last
> acting [8,4]
> pg 3.46 is stuck inactive for 115905.061894, current state peering, last
> acting [7,4]
> pg 3.40 is stuck inactive for 115916.701055, current state peering, last
> acting [7,3]
> pg 3.a0 is stuck inactive for 156540.416593, current state peering, last
> acting [7,6]
> pg 3.42 is stuck inactive for 116084.025651, current state peering, last
> acting [8,6]
> pg 3.a1 is stuck inactive for 115905.061404, current state peering, last
> acting [7,5]
> pg 3.a3 is stuck inactive for 156592.632676, current state peering, last
> acting [8,3]
> pg 3.3d is stuck inactive for 115909.536349, current state peering, last
> acting [0,4]
> pg 3.9c is stuck inactive for 115913.639973, current state peering, last
> acting [8,3]
> pg 3.fe is stuck inactive for 115915.304682, current state peering, last
> acting [0,3]
> pg 3.98 is stuck inactive for 115908.287692, current state peering, last
> acting [8,6]
> pg 3.f9 is stuck inactive for 115913.708198, current state peering, last
> acting [8,3]
> pg 3.3b is stuck inactive for 115915.304652, current state peering, last
> acting [0,3]
> pg 3.9b is stuck inactive for 115905.061445, current state peering, last
> acting [7,4]
> pg 3.35 is stuck inactive for 156760.780737, current state peering, last
> acting [7,3]
> pg 3.97 is stuck inactive for 115913.854036, current state peering, last
> acting [0,3]
> pg 3.31 is stuck inactive for 115910.565637, current state peering, last
> acting [1,4]
> pg 3.f0 is stuck inactive for 115915.000192, current state peering, last
> acting [7,6]
> pg 3.33 is stuck inactive for 115908.911398, current state peering, last
> acting [0,6]
> pg 3.92 is stuck inactive for 115914.503597, current state peering, last
> acting [8,3]
> pg 3.93 is stuck inactive for 244912.512404, current state
> remapped+peering, last acting [6,0]
> pg 3.2f is stuck inactive for 115980.326105, current state peering, last
> acting [1,6]
> pg 3.ed is stuck inactive for 115909.859689, current state peering, last
> acting [8,4]
> pg 3.28 is stuck inactive for 115913.708757, current state peering, last
> acting [8,5]
> pg 3.ee is stuck inactive for 115913.708285, current state peering, last
> acting [8,3]
> pg 3.29 is stuck inactive for 115905.062092, current state peering, last
> acting [7,4]
> pg 3.89 is stuck inactive for 244908.745759, current state
> remapped+peering, last acting [4,2]
> pg 3.e8 is stuck inactive for 115916.700729, current state peering, last
> acting [7,3]
> pg 3.24 is stuck inactive for 115909.860570, current state peering, last
> acting [8,4]
> pg 3.ea is stuck inactive for 115913.708316, current state peering, last
> acting [8,3]
> pg 3.84 is stuck inactive for 115913.708549, current state peering, last
> acting [8,3]
> pg 3.e4 is stuck inactive for 115905.061352, current state peering, last
> acting [7,4]
> pg 3.86 is stuck inactive for 115914.617720, current state peering, last
> acting [8,3]
> pg 3.20 is stuck inactive for 156654.164647, current state peering, last
> acting [7,6]
> pg 3.21 is stuck unclean for 115937.161932, current state peering, last
> acting [7,5]
> pg 3.80 is stuck unclean for 115913.708641, current state peering, last
> acting [8,6]
> pg 3.23 is stuck unclean for 156640.618257, current state peering, last
> acting [8,3]
> pg 3.82 is stuck unclean for 115931.967266, current state peering, last
> acting [1,5]
> pg 3.e1 is stuck unclean for 116121.694416, current state peering, last
> acting [0,6]
> pg 3.1c is stuck unclean for 115916.431308, current state peering, last
> acting [8,3]
> pg 3.7e is stuck unclean for 115918.391137, current state peering, last
> acting [0,3]
> pg 3.18 is stuck unclean for 115908.251019, current state peering, last
> acting [8,6]
> pg 3.79 is stuck unclean for 115914.617864, current state peering, last
> acting [8,3]
> pg 3.d8 is stuck unclean for 116341.813466, current state peering, last
> acting [2,6]
> pg 3.1b is stuck unclean for 115905.061262, current state peering, last
> acting [7,4]
> pg 3.d9 is stuck unclean for 156650.199403, current state peering, last
> acting [8,3]
> pg 3.db is stuck unclean for 115915.924260, current state peering, last
> acting [1,5]
> pg 3.d4 is stuck unclean for 115918.396273, current state peering, last
> acting [0,3]
> pg 3.17 is stuck unclean for 115915.304951, current state peering, last
> acting [0,3]
> pg 3.70 is stuck unclean for 115915.000581, current state peering, last
> acting [7,6]
> pg 3.12 is stuck unclean for 115916.467142, current state peering, last
> acting [8,3]
> pg 3.13 is stuck unclean for 254650.057287, current state
> remapped+peering, last acting [6,0]
> pg 3.d2 is stuck unclean for 115913.708481, current state peering, last
> acting [8,3]
> pg 3.6d is stuck unclean for 115909.860380, current state peering, last
> acting [8,4]
> pg 3.6e is stuck unclean for 115914.617747, current state peering, last
> acting [8,3]
> pg 3.9 is stuck unclean for 255316.515662, current state remapped+peering,
> last acting [4,2]
> pg 3.68 is stuck unclean for 115916.701246, current state peering, last
> acting [7,3]
> pg 3.6a is stuck unclean for 115914.617775, current state peering, last
> acting [8,3]
> pg 3.4 is stuck unclean for 115913.708241, current state peering, last
> acting [8,3]
> pg 3.ca is stuck unclean for 115915.923915, current state peering, last
> acting [0,6]
> pg 3.64 is stuck unclean for 115905.061969, current state peering, last
> acting [7,4]
> pg 3.6 is stuck unclean for 115913.708264, current state peering, last
> acting [8,3]
> pg 3.0 is stuck unclean for 116106.189737, current state peering, last
> acting [8,6]
> pg 3.c6 is stuck unclean for 115905.061775, current state peering, last
> acting [7,4]
> pg 3.2 is stuck unclean for 116351.262155, current state peering, last
> acting [1,5]
> pg 3.61 is stuck unclean for 115913.854289, current state peering, last
> acting [0,6]
> pg 3.c0 is stuck unclean for 115916.700973, current state peering, last
> acting [7,3]
> pg 3.c2 is stuck unclean for 115913.708556, current state peering, last
> acting [8,6]
> pg 3.bd is stuck unclean for 115909.142373, current state peering, last
> acting [0,4]
> pg 3.58 is stuck unclean for 116290.453992, current state peering, last
> acting [2,6]
> pg 3.59 is stuck unclean for 156592.727616, current state peering, last
> acting [8,3]
> pg 3.5b is stuck unclean for 115915.927668, current state peering, last
> acting [1,5]
> pg 3.54 is stuck unclean for 115918.391323, current state peering, last
> acting [0,3]
> pg 3.bb is stuck unclean for 115918.138514, current state peering, last
> acting [0,3]
> pg 3.b5 is stuck unclean for 156609.811589, current state peering, last
> acting [7,3]
> pg 3.52 is stuck unclean for 115914.617914, current state peering, last
> acting [8,3]
> pg 3.b1 is stuck unclean for 115910.407700, current state peering, last
> acting [1,4]
> pg 3.b3 is stuck unclean for 116204.050364, current state peering, last
> acting [0,6]
> pg 3.af is stuck unclean for 115908.305031, current state peering, last
> acting [1,6]
> pg 3.a8 is stuck unclean for 115909.754082, current state peering, last
> acting [8,5]
> pg 3.4a is stuck unclean for 115913.854406, current state peering, last
> acting [0,6]
> pg 3.a9 is stuck unclean for 115905.061535, current state peering, last
> acting [7,4]
> pg 3.a4 is stuck unclean for 115909.754111, current state peering, last
> acting [8,4]
> pg 3.46 is stuck unclean for 115905.062087, current state peering, last
> acting [7,4]
> pg 3.40 is stuck unclean for 115916.701248, current state peering, last
> acting [7,3]
> pg 3.a0 is stuck unclean for 156540.416786, current state peering, last
> acting [7,6]
> pg 3.42 is stuck unclean for 116084.025844, current state peering, last
> acting [8,6]
> pg 3.a1 is stuck unclean for 115905.061597, current state peering, last
> acting [7,5]
> pg 3.a3 is stuck unclean for 156592.632868, current state peering, last
> acting [8,3]
> pg 3.3d is stuck unclean for 115909.536541, current state peering, last
> acting [0,4]
> pg 3.9c is stuck unclean for 115913.640165, current state peering, last
> acting [8,3]
> pg 3.fe is stuck unclean for 115915.304874, current state peering, last
> acting [0,3]
> pg 3.98 is stuck unclean for 115908.287885, current state peering, last
> acting [8,6]
> pg 3.f9 is stuck unclean for 115913.708390, current state peering, last
> acting [8,3]
> pg 3.3b is stuck unclean for 115915.304844, current state peering, last
> acting [0,3]
> pg 3.9b is stuck unclean for 115905.061638, current state peering, last
> acting [7,4]
> pg 3.35 is stuck unclean for 156760.780929, current state peering, last
> acting [7,3]
> pg 3.97 is stuck unclean for 115913.854229, current state peering, last
> acting [0,3]
> pg 3.31 is stuck unclean for 115910.565829, current state peering, last
> acting [1,4]
> pg 3.f0 is stuck unclean for 115915.000385, current state peering, last
> acting [7,6]
> pg 3.33 is stuck unclean for 115908.911591, current state peering, last
> acting [0,6]
> pg 3.92 is stuck unclean for 115914.503790, current state peering, last
> acting [8,3]
> pg 3.93 is stuck unclean for 254650.057387, current state
> remapped+peering, last acting [6,0]
> pg 3.2f is stuck unclean for 115980.326297, current state peering, last
> acting [1,6]
> pg 3.ed is stuck unclean for 115909.859881, current state peering, last
> acting [8,4]
> pg 3.28 is stuck unclean for 115913.708950, current state peering, last
> acting [8,5]
> pg 3.ee is stuck unclean for 115913.708477, current state peering, last
> acting [8,3]
> pg 3.29 is stuck unclean for 115905.062284, current state peering, last
> acting [7,4]
> pg 3.89 is stuck unclean for 255316.515766, current state
> remapped+peering, last acting [4,2]
> pg 3.e8 is stuck unclean for 115916.700921, current state peering, last
> acting [7,3]
> pg 3.24 is stuck unclean for 115909.860762, current state peering, last
> acting [8,4]
> pg 3.ea is stuck unclean for 115913.708507, current state peering, last
> acting [8,3]
> pg 3.84 is stuck unclean for 115913.708741, current state peering, last
> acting [8,3]
> pg 3.e4 is stuck unclean for 115905.061544, current state peering, last
> acting [7,4]
> pg 3.86 is stuck unclean for 115914.617912, current state peering, last
> acting [8,3]
> pg 3.20 is stuck unclean for 156654.164838, current state peering, last
> acting [7,6]
> pg 3.ed is peering, acting [8,4]
> pg 3.ee is peering, acting [8,3]
> pg 3.e8 is peering, acting [7,3]
> pg 3.ea is peering, acting [8,3]
> pg 3.e4 is peering, acting [7,4]
> pg 3.e1 is peering, acting [0,6]
> pg 3.d8 is peering, acting [2,6]
> pg 3.d9 is peering, acting [8,3]
> pg 3.db is peering, acting [1,5]
> pg 3.d4 is peering, acting [0,3]
> pg 3.d2 is peering, acting [8,3]
> pg 3.ca is peering, acting [0,6]
> pg 3.c6 is peering, acting [7,4]
> pg 3.c0 is peering, acting [7,3]
> pg 3.c2 is peering, acting [8,6]
> pg 3.bd is peering, acting [0,4]
> pg 3.bb is peering, acting [0,3]
> pg 3.b5 is peering, acting [7,3]
> pg 3.b1 is peering, acting [1,4]
> pg 3.b3 is peering, acting [0,6]
> pg 3.af is peering, acting [1,6]
> pg 3.a8 is peering, acting [8,5]
> pg 3.a9 is peering, acting [7,4]
> pg 3.a4 is peering, acting [8,4]
> pg 3.a0 is peering, acting [7,6]
> pg 3.a1 is peering, acting [7,5]
> pg 3.a3 is peering, acting [8,3]
> pg 3.9c is peering, acting [8,3]
> pg 3.98 is peering, acting [8,6]
> pg 3.9b is peering, acting [7,4]
> pg 3.97 is peering, acting [0,3]
> pg 3.92 is peering, acting [8,3]
> pg 3.93 is remapped+peering, acting [6,0]
> pg 3.89 is remapped+peering, acting [4,2]
> pg 3.84 is peering, acting [8,3]
> pg 3.86 is peering, acting [8,3]
> pg 3.80 is peering, acting [8,6]
> pg 3.82 is peering, acting [1,5]
> pg 3.7e is peering, acting [0,3]
> pg 3.79 is peering, acting [8,3]
> pg 3.70 is peering, acting [7,6]
> pg 3.6d is peering, acting [8,4]
> pg 3.6e is peering, acting [8,3]
> pg 3.68 is peering, acting [7,3]
> pg 3.6a is peering, acting [8,3]
> pg 3.64 is peering, acting [7,4]
> pg 3.61 is peering, acting [0,6]
> pg 3.58 is peering, acting [2,6]
> pg 3.59 is peering, acting [8,3]
> pg 3.5b is peering, acting [1,5]
> pg 3.54 is peering, acting [0,3]
> pg 3.52 is peering, acting [8,3]
> pg 3.4a is peering, acting [0,6]
> pg 3.46 is peering, acting [7,4]
> pg 3.40 is peering, acting [7,3]
> pg 3.42 is peering, acting [8,6]
> pg 3.3d is peering, acting [0,4]
> pg 3.3b is peering, acting [0,3]
> pg 3.35 is peering, acting [7,3]
> pg 3.31 is peering, acting [1,4]
> pg 3.33 is peering, acting [0,6]
> pg 3.2f is peering, acting [1,6]
> pg 3.28 is peering, acting [8,5]
> pg 3.29 is peering, acting [7,4]
> pg 3.24 is peering, acting [8,4]
> pg 3.20 is peering, acting [7,6]
> pg 3.21 is peering, acting [7,5]
> pg 3.23 is peering, acting [8,3]
> pg 3.1c is peering, acting [8,3]
> pg 3.18 is peering, acting [8,6]
> pg 3.1b is peering, acting [7,4]
> pg 3.17 is peering, acting [0,3]
> pg 3.12 is peering, acting [8,3]
> pg 3.13 is remapped+peering, acting [6,0]
> pg 3.9 is remapped+peering, acting [4,2]
> pg 3.4 is peering, acting [8,3]
> pg 3.6 is peering, acting [8,3]
> pg 3.0 is peering, acting [8,6]
> pg 3.2 is peering, acting [1,5]
> pg 3.fe is peering, acting [0,3]
> pg 3.f9 is peering, acting [8,3]
> pg 3.f0 is peering, acting [7,6]
> 1 ops are blocked > 134218 sec
> 1 ops are blocked > 134218 sec on osd.8
> 1 osds have slow requests
> pool images pg_num 256 > pgp_num 128
>
>
>
> and this is the output of ceph pg 3.a9 query (the stats section looks to
> be important. The number of bytes recovered is significantly larger than
> the size of the pg)
> {
> "state": "peering",
> "snap_trimq": "[]",
> "epoch": 211256,
> "up": [
> 7,
> 4
> ],
> "acting": [
> 7,
> 4
> ],
> "info": {
> "pgid": "3.a9",
> "last_update": "3359'110581",
> "last_complete": "3359'110581",
> "log_tail": "850'107578",
> "last_user_version": 110581,
> "last_backfill": "MAX",
> "purged_snaps": "[]",
> "history": {
> "epoch_created": 31,
> "last_epoch_started": 116841,
> "last_epoch_clean": 116844,
> "last_epoch_split": 0,
> "same_up_since": 116838,
> "same_interval_since": 126562,
> "same_primary_since": 1202,
> "last_scrub": "3359'110581",
> "last_scrub_stamp": "2015-11-13 13:22:55.682647",
> "last_deep_scrub": "987'109658",
> "last_deep_scrub_stamp": "2015-11-09 13:56:36.850047",
> "last_clean_scrub_stamp": "2015-11-13 13:22:55.682647"
> },
> "stats": {
> "version": "3359'110581",
> "reported_seq": "103843",
> "reported_epoch": "211192",
> "state": "peering",
> "last_fresh": "2015-11-15 17:45:30.009129",
> "last_change": "2015-11-14 11:25:20.451898",
> "last_active": "2015-11-14 09:35:03.312840",
> "last_peered": "2015-11-14 09:35:03.312840",
> "last_clean": "2015-11-14 09:35:03.312840",
> "last_became_active": "0.000000",
> "last_became_peered": "0.000000",
> "last_unstale": "2015-11-15 17:45:30.009129",
> "last_undegraded": "2015-11-15 17:45:30.009129",
> "last_fullsized": "2015-11-15 17:45:30.009129",
> "mapping_epoch": 94611,
> "log_start": "850'107578",
> "ondisk_log_start": "850'107578",
> "created": 31,
> "last_epoch_clean": 116844,
> "parent": "0.0",
> "parent_split_bits": 0,
> "last_scrub": "3359'110581",
> "last_scrub_stamp": "2015-11-13 13:22:55.682647",
> "last_deep_scrub": "987'109658",
> "last_deep_scrub_stamp": "2015-11-09 13:56:36.850047",
> "last_clean_scrub_stamp": "2015-11-13 13:22:55.682647",
> "log_size": 3003,
> "ondisk_log_size": 3003,
> "stats_invalid": "1",
> "stat_sum": {
> "num_bytes": 18268690441,
> "num_objects": 4402,
> "num_object_clones": 0,
> "num_object_copies": 8804,
> "num_objects_missing_on_primary": 0,
> "num_objects_degraded": 0,
> "num_objects_misplaced": 0,
> "num_objects_unfound": 0,
> "num_objects_dirty": 4402,
> "num_whiteouts": 0,
> "num_read": 2268,
> "num_read_kb": 31055,
> "num_write": 8111,
> "num_write_kb": 1762444,
> "num_scrub_errors": 0,
> "num_shallow_scrub_errors": 0,
> "num_deep_scrub_errors": 0,
> "num_objects_recovered": 13228,
> "num_bytes_recovered": 54922698769,
> "num_keys_recovered": 0,
> "num_objects_omap": 0,
> "num_objects_hit_set_archive": 0,
> "num_bytes_hit_set_archive": 0
> },
> "up": [
> 7,
> 4
> ],
> "acting": [
> 7,
> 4
> ],
> "blocked_by": [
> 4
> ],
> "up_primary": 7,
> "acting_primary": 7
> },
> "empty": 0,
> "dne": 0,
> "incomplete": 0,
> "last_epoch_started": 116841,
> "hit_set_history": {
> "current_last_update": "0'0",
> "current_last_stamp": "0.000000",
> "current_info": {
> "begin": "0.000000",
> "end": "0.000000",
> "version": "0'0"
> },
> "history": []
> }
> },
> "peer_info": [],
> "recovery_state": [
> {
> "name": "Started\/Primary\/Peering\/GetInfo",
> "enter_time": "2015-11-14 11:25:20.451888",
> "requested_info_from": [
> {
> "osd": "4"
> }
> ]
> },
> {
> "name": "Started\/Primary\/Peering",
> "enter_time": "2015-11-14 11:25:20.451882",
> "past_intervals": [
> {
> "first": 116838,
> "last": 120813,
> "maybe_went_rw": 1,
> "up": [
> 7,
> 4
> ],
> "acting": [
> 7,
> 4
> ],
> "primary": 7,
> "up_primary": 7
> },
> {
> "first": 120814,
> "last": 120889,
> "maybe_went_rw": 1,
> "up": [
> 7,
> 4
> ],
> "acting": [
> 7,
> 4
> ],
> "primary": 7,
> "up_primary": 7
> },
> {
> "first": 120890,
> "last": 126561,
> "maybe_went_rw": 1,
> "up": [
> 7,
> 4
> ],
> "acting": [
> 7,
> 4
> ],
> "primary": 7,
> "up_primary": 7
> }
> ],
> "probing_osds": [
> "4",
> "7"
> ],
> "down_osds_we_would_probe": [],
> "peering_blocked_by": []
> },
> {
> "name": "Started",
> "enter_time": "2015-11-14 11:25:20.451851"
> }
> ],
> "agent_state": {}
> }
>
>
> Regards
> Pete
>
> On 15 November 2015 at 01:26, Peter Theobald <[email protected]> wrote:
>
>> Hi Gregory,
>> This is the output of ceph -s
>> cluster 5400bbc9-378d-4c69-afc4-da71393f7baf
>> health HEALTH_WARN
>> 82 pgs peering
>> 82 pgs stuck inactive
>> 82 pgs stuck unclean
>> 1 requests are blocked > 32 sec
>> pool images pg_num 256 > pgp_num 128
>> monmap e2: 2 mons at {0=192.168.2.1:6789/0,1=192.168.2.3:6789/0}
>> election epoch 16, quorum 0,1 0,1
>> osdmap e168004: 9 osds: 9 up, 9 in; 4 remapped pgs
>> pgmap v1317963: 256 pgs, 1 pools, 4377 GB data, 1105 kobjects
>> 8792 GB used, 15369 GB / 24162 GB avail
>> 174 active+clean
>> 78 peering
>> 4 remapped+peering
>>
>>
>> Total available space is about 24TB. Used space is 8TB at replication
>> level of 2,
>>
>> Regards
>> Pete
>>
>> On 14 November 2015 at 18:03, Gregory Farnum <[email protected]> wrote:
>>
>>> What's the full output of "Ceph -s"? Are your new crush rules actually
>>> satisfiable? Is your cluster filling up?
>>> -Greg
>>>
>>>
>>> On Saturday, November 14, 2015, Peter Theobald <[email protected]>
>>> wrote:
>>>
>>>> Hi list,
>>>>
>>>> I have a 3 node ceph cluster with a total of 9 ods (2,3 and 4 with
>>>> different size drives). I changed the layout (failure domain from per osd
>>>> to per host and changed min_size) and I now have a few pgs stuck in peering
>>>> or remapped+peering for a couple of day now.
>>>>
>>>> The hosts are under powered. 2x hp microservers and a single i5 desktop
>>>> grade machine so not super powerful. The network is fast though (bonded gb
>>>> ethernet with dedicated switch).
>>>>
>>>> I'm concerned that the remapped+peering pgs are stuck. All the nodes in
>>>> peering or remapped+peering are stuck inactive and unclean so i'm concerned
>>>> about data loss. Do I just need to wait for them to fix themselves? I
>>>> cannot see any mention of unfound objects when I query the remapped pgs so
>>>> I think i'm ok and just need to be patient. I have 128 pgs across 9 osds so
>>>> probably have a lot of objects per pg. Total data is about 4TB
>>>>
>>>> Regards
>>>>
>>>> Pete
>>>>
>>>
>>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com