I spotted a section in the pg query about firty objects so i've looked into
that. The ceph documentation is very light on this, but I found the osd and
pg repair commands. I issued the osd repair command and I have now reduced
the number of unclean pgs. This is the output of ceph -s now

    cluster 5400bbc9-378d-4c69-afc4-da71393f7baf
     health HEALTH_WARN
            66 pgs peering
            2 pgs repair
            66 pgs stuck inactive
            66 pgs stuck unclean
            4 requests are blocked > 32 sec
            pool images pg_num 256 > pgp_num 128
     monmap e2: 2 mons at {0=192.168.2.1:6789/0,1=192.168.2.3:6789/0}
            election epoch 16, quorum 0,1 0,1
     osdmap e214793: 9 osds: 9 up, 9 in; 4 remapped pgs
      pgmap v1400150: 256 pgs, 1 pools, 4377 GB data, 1105 kobjects
            8801 GB used, 15361 GB / 24162 GB avail
                 188 active+clean
                  62 peering
                   4 remapped+peering
                   2 active+clean+scrubbing+deep+repair


Pete

On 15 November 2015 at 18:04, Peter Theobald <[email protected]> wrote:

> I still have the pgs stuck peering. I ran ceph pg n.nn query on a few of
> the pgs that are stuck. The ones that are just peering have a few entries
> in recovery_state -> past_intervals (Example at end of message) and the
> ones that say remapped+peering have a long entry here. I don't know what
> the content of pg query is but I have a ffeling that I have had writes to
> different nodes and that has messed up a few objects. I have a lot of
> network traffic between the nodes, a few hundred Mbps which would fit with
> osds trying to work out their state (9 disks with a random IO pattern would
> fit with the level of bandwidth i'm seeing).
>
> This is the full output of ceph health detail
>
> sudo ceph health detail
> HEALTH_WARN 82 pgs peering; 82 pgs stuck inactive; 82 pgs stuck unclean; 1
> requests are blocked > 32 sec; 1 osds have slow requests; pool images
> pg_num 256 > pgp_num 128
> pg 3.21 is stuck inactive for 115937.161742, current state peering, last
> acting [7,5]
> pg 3.80 is stuck inactive for 115913.708453, current state peering, last
> acting [8,6]
> pg 3.23 is stuck inactive for 156640.618069, current state peering, last
> acting [8,3]
> pg 3.82 is stuck inactive for 115931.967078, current state peering, last
> acting [1,5]
> pg 3.e1 is stuck inactive for 116121.694227, current state peering, last
> acting [0,6]
> pg 3.1c is stuck inactive for 115916.431120, current state peering, last
> acting [8,3]
> pg 3.7e is stuck inactive for 115918.390949, current state peering, last
> acting [0,3]
> pg 3.18 is stuck inactive for 115908.250832, current state peering, last
> acting [8,6]
> pg 3.79 is stuck inactive for 115914.617676, current state peering, last
> acting [8,3]
> pg 3.d8 is stuck inactive for 116341.813279, current state peering, last
> acting [2,6]
> pg 3.1b is stuck inactive for 115905.061074, current state peering, last
> acting [7,4]
> pg 3.d9 is stuck inactive for 156650.199216, current state peering, last
> acting [8,3]
> pg 3.db is stuck inactive for 115915.924073, current state peering, last
> acting [1,5]
> pg 3.d4 is stuck inactive for 115918.396086, current state peering, last
> acting [0,3]
> pg 3.17 is stuck inactive for 115915.304764, current state peering, last
> acting [0,3]
> pg 3.70 is stuck inactive for 115915.000395, current state peering, last
> acting [7,6]
> pg 3.12 is stuck inactive for 115916.466955, current state peering, last
> acting [8,3]
> pg 3.13 is stuck inactive for 244912.512309, current state
> remapped+peering, last acting [6,0]
> pg 3.d2 is stuck inactive for 115913.708294, current state peering, last
> acting [8,3]
> pg 3.6d is stuck inactive for 115909.860193, current state peering, last
> acting [8,4]
> pg 3.6e is stuck inactive for 115914.617561, current state peering, last
> acting [8,3]
> pg 3.9 is stuck inactive for 244908.745661, current state
> remapped+peering, last acting [4,2]
> pg 3.68 is stuck inactive for 115916.701060, current state peering, last
> acting [7,3]
> pg 3.6a is stuck inactive for 115914.617589, current state peering, last
> acting [8,3]
> pg 3.4 is stuck inactive for 115913.708054, current state peering, last
> acting [8,3]
> pg 3.ca is stuck inactive for 115915.923728, current state peering, last
> acting [0,6]
> pg 3.64 is stuck inactive for 115905.061782, current state peering, last
> acting [7,4]
> pg 3.6 is stuck inactive for 115913.708077, current state peering, last
> acting [8,3]
> pg 3.0 is stuck inactive for 116106.189550, current state peering, last
> acting [8,6]
> pg 3.c6 is stuck inactive for 115905.061588, current state peering, last
> acting [7,4]
> pg 3.2 is stuck inactive for 116351.261968, current state peering, last
> acting [1,5]
> pg 3.61 is stuck inactive for 115913.854102, current state peering, last
> acting [0,6]
> pg 3.c0 is stuck inactive for 115916.700785, current state peering, last
> acting [7,3]
> pg 3.c2 is stuck inactive for 115913.708368, current state peering, last
> acting [8,6]
> pg 3.bd is stuck inactive for 115909.142185, current state peering, last
> acting [0,4]
> pg 3.58 is stuck inactive for 116290.453805, current state peering, last
> acting [2,6]
> pg 3.59 is stuck inactive for 156592.727428, current state peering, last
> acting [8,3]
> pg 3.5b is stuck inactive for 115915.927480, current state peering, last
> acting [1,5]
> pg 3.54 is stuck inactive for 115918.391135, current state peering, last
> acting [0,3]
> pg 3.bb is stuck inactive for 115918.138327, current state peering, last
> acting [0,3]
> pg 3.b5 is stuck inactive for 156609.811401, current state peering, last
> acting [7,3]
> pg 3.52 is stuck inactive for 115914.617727, current state peering, last
> acting [8,3]
> pg 3.b1 is stuck inactive for 115910.407513, current state peering, last
> acting [1,4]
> pg 3.b3 is stuck inactive for 116204.050176, current state peering, last
> acting [0,6]
> pg 3.af is stuck inactive for 115908.304844, current state peering, last
> acting [1,6]
> pg 3.a8 is stuck inactive for 115909.753895, current state peering, last
> acting [8,5]
> pg 3.4a is stuck inactive for 115913.854219, current state peering, last
> acting [0,6]
> pg 3.a9 is stuck inactive for 115905.061347, current state peering, last
> acting [7,4]
> pg 3.a4 is stuck inactive for 115909.753923, current state peering, last
> acting [8,4]
> pg 3.46 is stuck inactive for 115905.061894, current state peering, last
> acting [7,4]
> pg 3.40 is stuck inactive for 115916.701055, current state peering, last
> acting [7,3]
> pg 3.a0 is stuck inactive for 156540.416593, current state peering, last
> acting [7,6]
> pg 3.42 is stuck inactive for 116084.025651, current state peering, last
> acting [8,6]
> pg 3.a1 is stuck inactive for 115905.061404, current state peering, last
> acting [7,5]
> pg 3.a3 is stuck inactive for 156592.632676, current state peering, last
> acting [8,3]
> pg 3.3d is stuck inactive for 115909.536349, current state peering, last
> acting [0,4]
> pg 3.9c is stuck inactive for 115913.639973, current state peering, last
> acting [8,3]
> pg 3.fe is stuck inactive for 115915.304682, current state peering, last
> acting [0,3]
> pg 3.98 is stuck inactive for 115908.287692, current state peering, last
> acting [8,6]
> pg 3.f9 is stuck inactive for 115913.708198, current state peering, last
> acting [8,3]
> pg 3.3b is stuck inactive for 115915.304652, current state peering, last
> acting [0,3]
> pg 3.9b is stuck inactive for 115905.061445, current state peering, last
> acting [7,4]
> pg 3.35 is stuck inactive for 156760.780737, current state peering, last
> acting [7,3]
> pg 3.97 is stuck inactive for 115913.854036, current state peering, last
> acting [0,3]
> pg 3.31 is stuck inactive for 115910.565637, current state peering, last
> acting [1,4]
> pg 3.f0 is stuck inactive for 115915.000192, current state peering, last
> acting [7,6]
> pg 3.33 is stuck inactive for 115908.911398, current state peering, last
> acting [0,6]
> pg 3.92 is stuck inactive for 115914.503597, current state peering, last
> acting [8,3]
> pg 3.93 is stuck inactive for 244912.512404, current state
> remapped+peering, last acting [6,0]
> pg 3.2f is stuck inactive for 115980.326105, current state peering, last
> acting [1,6]
> pg 3.ed is stuck inactive for 115909.859689, current state peering, last
> acting [8,4]
> pg 3.28 is stuck inactive for 115913.708757, current state peering, last
> acting [8,5]
> pg 3.ee is stuck inactive for 115913.708285, current state peering, last
> acting [8,3]
> pg 3.29 is stuck inactive for 115905.062092, current state peering, last
> acting [7,4]
> pg 3.89 is stuck inactive for 244908.745759, current state
> remapped+peering, last acting [4,2]
> pg 3.e8 is stuck inactive for 115916.700729, current state peering, last
> acting [7,3]
> pg 3.24 is stuck inactive for 115909.860570, current state peering, last
> acting [8,4]
> pg 3.ea is stuck inactive for 115913.708316, current state peering, last
> acting [8,3]
> pg 3.84 is stuck inactive for 115913.708549, current state peering, last
> acting [8,3]
> pg 3.e4 is stuck inactive for 115905.061352, current state peering, last
> acting [7,4]
> pg 3.86 is stuck inactive for 115914.617720, current state peering, last
> acting [8,3]
> pg 3.20 is stuck inactive for 156654.164647, current state peering, last
> acting [7,6]
> pg 3.21 is stuck unclean for 115937.161932, current state peering, last
> acting [7,5]
> pg 3.80 is stuck unclean for 115913.708641, current state peering, last
> acting [8,6]
> pg 3.23 is stuck unclean for 156640.618257, current state peering, last
> acting [8,3]
> pg 3.82 is stuck unclean for 115931.967266, current state peering, last
> acting [1,5]
> pg 3.e1 is stuck unclean for 116121.694416, current state peering, last
> acting [0,6]
> pg 3.1c is stuck unclean for 115916.431308, current state peering, last
> acting [8,3]
> pg 3.7e is stuck unclean for 115918.391137, current state peering, last
> acting [0,3]
> pg 3.18 is stuck unclean for 115908.251019, current state peering, last
> acting [8,6]
> pg 3.79 is stuck unclean for 115914.617864, current state peering, last
> acting [8,3]
> pg 3.d8 is stuck unclean for 116341.813466, current state peering, last
> acting [2,6]
> pg 3.1b is stuck unclean for 115905.061262, current state peering, last
> acting [7,4]
> pg 3.d9 is stuck unclean for 156650.199403, current state peering, last
> acting [8,3]
> pg 3.db is stuck unclean for 115915.924260, current state peering, last
> acting [1,5]
> pg 3.d4 is stuck unclean for 115918.396273, current state peering, last
> acting [0,3]
> pg 3.17 is stuck unclean for 115915.304951, current state peering, last
> acting [0,3]
> pg 3.70 is stuck unclean for 115915.000581, current state peering, last
> acting [7,6]
> pg 3.12 is stuck unclean for 115916.467142, current state peering, last
> acting [8,3]
> pg 3.13 is stuck unclean for 254650.057287, current state
> remapped+peering, last acting [6,0]
> pg 3.d2 is stuck unclean for 115913.708481, current state peering, last
> acting [8,3]
> pg 3.6d is stuck unclean for 115909.860380, current state peering, last
> acting [8,4]
> pg 3.6e is stuck unclean for 115914.617747, current state peering, last
> acting [8,3]
> pg 3.9 is stuck unclean for 255316.515662, current state remapped+peering,
> last acting [4,2]
> pg 3.68 is stuck unclean for 115916.701246, current state peering, last
> acting [7,3]
> pg 3.6a is stuck unclean for 115914.617775, current state peering, last
> acting [8,3]
> pg 3.4 is stuck unclean for 115913.708241, current state peering, last
> acting [8,3]
> pg 3.ca is stuck unclean for 115915.923915, current state peering, last
> acting [0,6]
> pg 3.64 is stuck unclean for 115905.061969, current state peering, last
> acting [7,4]
> pg 3.6 is stuck unclean for 115913.708264, current state peering, last
> acting [8,3]
> pg 3.0 is stuck unclean for 116106.189737, current state peering, last
> acting [8,6]
> pg 3.c6 is stuck unclean for 115905.061775, current state peering, last
> acting [7,4]
> pg 3.2 is stuck unclean for 116351.262155, current state peering, last
> acting [1,5]
> pg 3.61 is stuck unclean for 115913.854289, current state peering, last
> acting [0,6]
> pg 3.c0 is stuck unclean for 115916.700973, current state peering, last
> acting [7,3]
> pg 3.c2 is stuck unclean for 115913.708556, current state peering, last
> acting [8,6]
> pg 3.bd is stuck unclean for 115909.142373, current state peering, last
> acting [0,4]
> pg 3.58 is stuck unclean for 116290.453992, current state peering, last
> acting [2,6]
> pg 3.59 is stuck unclean for 156592.727616, current state peering, last
> acting [8,3]
> pg 3.5b is stuck unclean for 115915.927668, current state peering, last
> acting [1,5]
> pg 3.54 is stuck unclean for 115918.391323, current state peering, last
> acting [0,3]
> pg 3.bb is stuck unclean for 115918.138514, current state peering, last
> acting [0,3]
> pg 3.b5 is stuck unclean for 156609.811589, current state peering, last
> acting [7,3]
> pg 3.52 is stuck unclean for 115914.617914, current state peering, last
> acting [8,3]
> pg 3.b1 is stuck unclean for 115910.407700, current state peering, last
> acting [1,4]
> pg 3.b3 is stuck unclean for 116204.050364, current state peering, last
> acting [0,6]
> pg 3.af is stuck unclean for 115908.305031, current state peering, last
> acting [1,6]
> pg 3.a8 is stuck unclean for 115909.754082, current state peering, last
> acting [8,5]
> pg 3.4a is stuck unclean for 115913.854406, current state peering, last
> acting [0,6]
> pg 3.a9 is stuck unclean for 115905.061535, current state peering, last
> acting [7,4]
> pg 3.a4 is stuck unclean for 115909.754111, current state peering, last
> acting [8,4]
> pg 3.46 is stuck unclean for 115905.062087, current state peering, last
> acting [7,4]
> pg 3.40 is stuck unclean for 115916.701248, current state peering, last
> acting [7,3]
> pg 3.a0 is stuck unclean for 156540.416786, current state peering, last
> acting [7,6]
> pg 3.42 is stuck unclean for 116084.025844, current state peering, last
> acting [8,6]
> pg 3.a1 is stuck unclean for 115905.061597, current state peering, last
> acting [7,5]
> pg 3.a3 is stuck unclean for 156592.632868, current state peering, last
> acting [8,3]
> pg 3.3d is stuck unclean for 115909.536541, current state peering, last
> acting [0,4]
> pg 3.9c is stuck unclean for 115913.640165, current state peering, last
> acting [8,3]
> pg 3.fe is stuck unclean for 115915.304874, current state peering, last
> acting [0,3]
> pg 3.98 is stuck unclean for 115908.287885, current state peering, last
> acting [8,6]
> pg 3.f9 is stuck unclean for 115913.708390, current state peering, last
> acting [8,3]
> pg 3.3b is stuck unclean for 115915.304844, current state peering, last
> acting [0,3]
> pg 3.9b is stuck unclean for 115905.061638, current state peering, last
> acting [7,4]
> pg 3.35 is stuck unclean for 156760.780929, current state peering, last
> acting [7,3]
> pg 3.97 is stuck unclean for 115913.854229, current state peering, last
> acting [0,3]
> pg 3.31 is stuck unclean for 115910.565829, current state peering, last
> acting [1,4]
> pg 3.f0 is stuck unclean for 115915.000385, current state peering, last
> acting [7,6]
> pg 3.33 is stuck unclean for 115908.911591, current state peering, last
> acting [0,6]
> pg 3.92 is stuck unclean for 115914.503790, current state peering, last
> acting [8,3]
> pg 3.93 is stuck unclean for 254650.057387, current state
> remapped+peering, last acting [6,0]
> pg 3.2f is stuck unclean for 115980.326297, current state peering, last
> acting [1,6]
> pg 3.ed is stuck unclean for 115909.859881, current state peering, last
> acting [8,4]
> pg 3.28 is stuck unclean for 115913.708950, current state peering, last
> acting [8,5]
> pg 3.ee is stuck unclean for 115913.708477, current state peering, last
> acting [8,3]
> pg 3.29 is stuck unclean for 115905.062284, current state peering, last
> acting [7,4]
> pg 3.89 is stuck unclean for 255316.515766, current state
> remapped+peering, last acting [4,2]
> pg 3.e8 is stuck unclean for 115916.700921, current state peering, last
> acting [7,3]
> pg 3.24 is stuck unclean for 115909.860762, current state peering, last
> acting [8,4]
> pg 3.ea is stuck unclean for 115913.708507, current state peering, last
> acting [8,3]
> pg 3.84 is stuck unclean for 115913.708741, current state peering, last
> acting [8,3]
> pg 3.e4 is stuck unclean for 115905.061544, current state peering, last
> acting [7,4]
> pg 3.86 is stuck unclean for 115914.617912, current state peering, last
> acting [8,3]
> pg 3.20 is stuck unclean for 156654.164838, current state peering, last
> acting [7,6]
> pg 3.ed is peering, acting [8,4]
> pg 3.ee is peering, acting [8,3]
> pg 3.e8 is peering, acting [7,3]
> pg 3.ea is peering, acting [8,3]
> pg 3.e4 is peering, acting [7,4]
> pg 3.e1 is peering, acting [0,6]
> pg 3.d8 is peering, acting [2,6]
> pg 3.d9 is peering, acting [8,3]
> pg 3.db is peering, acting [1,5]
> pg 3.d4 is peering, acting [0,3]
> pg 3.d2 is peering, acting [8,3]
> pg 3.ca is peering, acting [0,6]
> pg 3.c6 is peering, acting [7,4]
> pg 3.c0 is peering, acting [7,3]
> pg 3.c2 is peering, acting [8,6]
> pg 3.bd is peering, acting [0,4]
> pg 3.bb is peering, acting [0,3]
> pg 3.b5 is peering, acting [7,3]
> pg 3.b1 is peering, acting [1,4]
> pg 3.b3 is peering, acting [0,6]
> pg 3.af is peering, acting [1,6]
> pg 3.a8 is peering, acting [8,5]
> pg 3.a9 is peering, acting [7,4]
> pg 3.a4 is peering, acting [8,4]
> pg 3.a0 is peering, acting [7,6]
> pg 3.a1 is peering, acting [7,5]
> pg 3.a3 is peering, acting [8,3]
> pg 3.9c is peering, acting [8,3]
> pg 3.98 is peering, acting [8,6]
> pg 3.9b is peering, acting [7,4]
> pg 3.97 is peering, acting [0,3]
> pg 3.92 is peering, acting [8,3]
> pg 3.93 is remapped+peering, acting [6,0]
> pg 3.89 is remapped+peering, acting [4,2]
> pg 3.84 is peering, acting [8,3]
> pg 3.86 is peering, acting [8,3]
> pg 3.80 is peering, acting [8,6]
> pg 3.82 is peering, acting [1,5]
> pg 3.7e is peering, acting [0,3]
> pg 3.79 is peering, acting [8,3]
> pg 3.70 is peering, acting [7,6]
> pg 3.6d is peering, acting [8,4]
> pg 3.6e is peering, acting [8,3]
> pg 3.68 is peering, acting [7,3]
> pg 3.6a is peering, acting [8,3]
> pg 3.64 is peering, acting [7,4]
> pg 3.61 is peering, acting [0,6]
> pg 3.58 is peering, acting [2,6]
> pg 3.59 is peering, acting [8,3]
> pg 3.5b is peering, acting [1,5]
> pg 3.54 is peering, acting [0,3]
> pg 3.52 is peering, acting [8,3]
> pg 3.4a is peering, acting [0,6]
> pg 3.46 is peering, acting [7,4]
> pg 3.40 is peering, acting [7,3]
> pg 3.42 is peering, acting [8,6]
> pg 3.3d is peering, acting [0,4]
> pg 3.3b is peering, acting [0,3]
> pg 3.35 is peering, acting [7,3]
> pg 3.31 is peering, acting [1,4]
> pg 3.33 is peering, acting [0,6]
> pg 3.2f is peering, acting [1,6]
> pg 3.28 is peering, acting [8,5]
> pg 3.29 is peering, acting [7,4]
> pg 3.24 is peering, acting [8,4]
> pg 3.20 is peering, acting [7,6]
> pg 3.21 is peering, acting [7,5]
> pg 3.23 is peering, acting [8,3]
> pg 3.1c is peering, acting [8,3]
> pg 3.18 is peering, acting [8,6]
> pg 3.1b is peering, acting [7,4]
> pg 3.17 is peering, acting [0,3]
> pg 3.12 is peering, acting [8,3]
> pg 3.13 is remapped+peering, acting [6,0]
> pg 3.9 is remapped+peering, acting [4,2]
> pg 3.4 is peering, acting [8,3]
> pg 3.6 is peering, acting [8,3]
> pg 3.0 is peering, acting [8,6]
> pg 3.2 is peering, acting [1,5]
> pg 3.fe is peering, acting [0,3]
> pg 3.f9 is peering, acting [8,3]
> pg 3.f0 is peering, acting [7,6]
> 1 ops are blocked > 134218 sec
> 1 ops are blocked > 134218 sec on osd.8
> 1 osds have slow requests
> pool images pg_num 256 > pgp_num 128
>
>
>
> and this is the output of ceph pg 3.a9 query (the stats section looks to
> be important. The number of bytes recovered is significantly larger than
> the size of the pg)
> {
>     "state": "peering",
>     "snap_trimq": "[]",
>     "epoch": 211256,
>     "up": [
>         7,
>         4
>     ],
>     "acting": [
>         7,
>         4
>     ],
>     "info": {
>         "pgid": "3.a9",
>         "last_update": "3359'110581",
>         "last_complete": "3359'110581",
>         "log_tail": "850'107578",
>         "last_user_version": 110581,
>         "last_backfill": "MAX",
>         "purged_snaps": "[]",
>         "history": {
>             "epoch_created": 31,
>             "last_epoch_started": 116841,
>             "last_epoch_clean": 116844,
>             "last_epoch_split": 0,
>             "same_up_since": 116838,
>             "same_interval_since": 126562,
>             "same_primary_since": 1202,
>             "last_scrub": "3359'110581",
>             "last_scrub_stamp": "2015-11-13 13:22:55.682647",
>             "last_deep_scrub": "987'109658",
>             "last_deep_scrub_stamp": "2015-11-09 13:56:36.850047",
>             "last_clean_scrub_stamp": "2015-11-13 13:22:55.682647"
>         },
>         "stats": {
>             "version": "3359'110581",
>             "reported_seq": "103843",
>             "reported_epoch": "211192",
>             "state": "peering",
>             "last_fresh": "2015-11-15 17:45:30.009129",
>             "last_change": "2015-11-14 11:25:20.451898",
>             "last_active": "2015-11-14 09:35:03.312840",
>             "last_peered": "2015-11-14 09:35:03.312840",
>             "last_clean": "2015-11-14 09:35:03.312840",
>             "last_became_active": "0.000000",
>             "last_became_peered": "0.000000",
>             "last_unstale": "2015-11-15 17:45:30.009129",
>             "last_undegraded": "2015-11-15 17:45:30.009129",
>             "last_fullsized": "2015-11-15 17:45:30.009129",
>             "mapping_epoch": 94611,
>             "log_start": "850'107578",
>             "ondisk_log_start": "850'107578",
>             "created": 31,
>             "last_epoch_clean": 116844,
>             "parent": "0.0",
>             "parent_split_bits": 0,
>             "last_scrub": "3359'110581",
>             "last_scrub_stamp": "2015-11-13 13:22:55.682647",
>             "last_deep_scrub": "987'109658",
>             "last_deep_scrub_stamp": "2015-11-09 13:56:36.850047",
>             "last_clean_scrub_stamp": "2015-11-13 13:22:55.682647",
>             "log_size": 3003,
>             "ondisk_log_size": 3003,
>             "stats_invalid": "1",
>             "stat_sum": {
>                 "num_bytes": 18268690441,
>                 "num_objects": 4402,
>                 "num_object_clones": 0,
>                 "num_object_copies": 8804,
>                 "num_objects_missing_on_primary": 0,
>                 "num_objects_degraded": 0,
>                 "num_objects_misplaced": 0,
>                 "num_objects_unfound": 0,
>                 "num_objects_dirty": 4402,
>                 "num_whiteouts": 0,
>                 "num_read": 2268,
>                 "num_read_kb": 31055,
>                 "num_write": 8111,
>                 "num_write_kb": 1762444,
>                 "num_scrub_errors": 0,
>                 "num_shallow_scrub_errors": 0,
>                 "num_deep_scrub_errors": 0,
>                 "num_objects_recovered": 13228,
>                 "num_bytes_recovered": 54922698769,
>                 "num_keys_recovered": 0,
>                 "num_objects_omap": 0,
>                 "num_objects_hit_set_archive": 0,
>                 "num_bytes_hit_set_archive": 0
>             },
>             "up": [
>                 7,
>                 4
>             ],
>             "acting": [
>                 7,
>                 4
>             ],
>             "blocked_by": [
>                 4
>             ],
>             "up_primary": 7,
>             "acting_primary": 7
>         },
>         "empty": 0,
>         "dne": 0,
>         "incomplete": 0,
>         "last_epoch_started": 116841,
>         "hit_set_history": {
>             "current_last_update": "0'0",
>             "current_last_stamp": "0.000000",
>             "current_info": {
>                 "begin": "0.000000",
>                 "end": "0.000000",
>                 "version": "0'0"
>             },
>             "history": []
>         }
>     },
>     "peer_info": [],
>     "recovery_state": [
>         {
>             "name": "Started\/Primary\/Peering\/GetInfo",
>             "enter_time": "2015-11-14 11:25:20.451888",
>             "requested_info_from": [
>                 {
>                     "osd": "4"
>                 }
>             ]
>         },
>         {
>             "name": "Started\/Primary\/Peering",
>             "enter_time": "2015-11-14 11:25:20.451882",
>             "past_intervals": [
>                 {
>                     "first": 116838,
>                     "last": 120813,
>                     "maybe_went_rw": 1,
>                     "up": [
>                         7,
>                         4
>                     ],
>                     "acting": [
>                         7,
>                         4
>                     ],
>                     "primary": 7,
>                     "up_primary": 7
>                 },
>                 {
>                     "first": 120814,
>                     "last": 120889,
>                     "maybe_went_rw": 1,
>                     "up": [
>                         7,
>                         4
>                     ],
>                     "acting": [
>                         7,
>                         4
>                     ],
>                     "primary": 7,
>                     "up_primary": 7
>                 },
>                 {
>                     "first": 120890,
>                     "last": 126561,
>                     "maybe_went_rw": 1,
>                     "up": [
>                         7,
>                         4
>                     ],
>                     "acting": [
>                         7,
>                         4
>                     ],
>                     "primary": 7,
>                     "up_primary": 7
>                 }
>             ],
>             "probing_osds": [
>                 "4",
>                 "7"
>             ],
>             "down_osds_we_would_probe": [],
>             "peering_blocked_by": []
>         },
>         {
>             "name": "Started",
>             "enter_time": "2015-11-14 11:25:20.451851"
>         }
>     ],
>     "agent_state": {}
> }
>
>
> Regards
> Pete
>
> On 15 November 2015 at 01:26, Peter Theobald <[email protected]> wrote:
>
>> Hi Gregory,
>> This is the output of ceph -s
>>     cluster 5400bbc9-378d-4c69-afc4-da71393f7baf
>>      health HEALTH_WARN
>>             82 pgs peering
>>             82 pgs stuck inactive
>>             82 pgs stuck unclean
>>             1 requests are blocked > 32 sec
>>             pool images pg_num 256 > pgp_num 128
>>      monmap e2: 2 mons at {0=192.168.2.1:6789/0,1=192.168.2.3:6789/0}
>>             election epoch 16, quorum 0,1 0,1
>>      osdmap e168004: 9 osds: 9 up, 9 in; 4 remapped pgs
>>       pgmap v1317963: 256 pgs, 1 pools, 4377 GB data, 1105 kobjects
>>             8792 GB used, 15369 GB / 24162 GB avail
>>                  174 active+clean
>>                   78 peering
>>                    4 remapped+peering
>>
>>
>> Total available space is about 24TB. Used space is 8TB at replication
>> level of 2,
>>
>> Regards
>> Pete
>>
>> On 14 November 2015 at 18:03, Gregory Farnum <[email protected]> wrote:
>>
>>> What's the full output of "Ceph -s"? Are your new crush rules actually
>>> satisfiable? Is your cluster filling up?
>>> -Greg
>>>
>>>
>>> On Saturday, November 14, 2015, Peter Theobald <[email protected]>
>>> wrote:
>>>
>>>> Hi list,
>>>>
>>>> I have a 3 node ceph cluster with a total of 9 ods (2,3 and 4 with
>>>> different size drives). I changed the layout (failure domain from per osd
>>>> to per host and changed min_size) and I now have a few pgs stuck in peering
>>>> or remapped+peering for a couple of day now.
>>>>
>>>> The hosts are under powered. 2x hp microservers and a single i5 desktop
>>>> grade machine so not super powerful. The network is fast though (bonded gb
>>>> ethernet with dedicated switch).
>>>>
>>>> I'm concerned that the remapped+peering pgs are stuck. All the nodes in
>>>> peering or remapped+peering are stuck inactive and unclean so i'm concerned
>>>> about data loss. Do I just need to wait for them to fix themselves? I
>>>> cannot see any mention of unfound objects when I query the remapped pgs so
>>>> I think i'm ok and just need to be patient. I have 128 pgs across 9 osds so
>>>> probably have a lot of objects per pg. Total data is about 4TB
>>>>
>>>> Regards
>>>>
>>>> Pete
>>>>
>>>
>>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to