Hi,
You have problems with MRG.
http://docs.ceph.com/docs/master/rados/operations/pg-states/
/The ceph-mgr hasn’t yet received any information about the PG’s state
from an OSD since mgr started up./
чт, 21 февр. 2019 г. в 09:04, Irek Fasikhov <[email protected]
<mailto:[email protected]>>:
Hi,
You have problems with MRG.
http://docs.ceph.com/docs/master/rados/operations/pg-states/
/The ceph-mgr hasn’t yet received any information about the PG’s
state from an OSD since mgr started up./
ср, 20 февр. 2019 г. в 23:10, Ranjan Ghosh <[email protected]
<mailto:[email protected]>>:
Hi all,
hope someone can help me. After restarting a node of my
2-node-cluster suddenly I get this:
root@yak2 /var/www/projects # ceph -s
cluster:
id: 749b2473-9300-4535-97a6-ee6d55008a1b
health: HEALTH_WARN
Reduced data availability: 200 pgs inactive
services:
mon: 3 daemons, quorum yak1,yak2,yak0
mgr: yak0.planwerk6.de <http://yak0.planwerk6.de>(active),
standbys: yak1.planwerk6.de <http://yak1.planwerk6.de>,
yak2.planwerk6.de <http://yak2.planwerk6.de>
mds: cephfs-1/1/1 up {0=yak1.planwerk6.de
<http://yak1.planwerk6.de>=up:active}, 1 up:standby
osd: 2 osds: 2 up, 2 in
data:
pools: 2 pools, 200 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
200 unknown
And this:
root@yak2 /var/www/projects # ceph health detail
HEALTH_WARN Reduced data availability: 200 pgs inactive
PG_AVAILABILITY Reduced data availability: 200 pgs inactive
pg 1.34 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.35 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.36 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.37 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.38 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.39 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.3a is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.3b is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.3c is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.3d is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.3e is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.3f is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.40 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.41 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.42 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.43 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.44 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.45 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.46 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.47 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.48 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.49 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.4a is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.4b is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.4c is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 1.4d is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.34 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.35 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.36 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.38 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.39 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.3a is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.3b is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.3c is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.3d is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.3e is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.3f is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.40 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.41 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.42 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.43 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.44 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.45 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.46 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.47 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.48 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.49 is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.4a is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.4b is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.4e is stuck inactive for 3506.815664, current state
unknown, last acting []
pg 2.4f is stuck inactive for 3506.815664, current state
unknown, last acting []
But if I query an individual PG I get this:
root@yak1 /var/www/projects # ceph pg 1.49 query
{
"state": "active+clean",
"snap_trimq": "[]",
"snap_trimq_len": 0,
"epoch": 162,
"up": [
0,
1
],
"acting": [
0,
1
],
"acting_recovery_backfill": [
"0",
"1"
],
"info": {
"pgid": "1.49",
"last_update": "127'38077",
"last_complete": "127'38077",
"log_tail": "127'35000",
"last_user_version": 38077,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [],
"history": {
"epoch_created": 10,
"epoch_pool_created": 10,
"last_epoch_started": 159,
"last_interval_started": 158,
"last_epoch_clean": 159,
"last_interval_clean": 158,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 158,
"same_interval_since": 158,
"same_primary_since": 135,
"last_scrub": "127'36909",
"last_scrub_stamp": "2019-02-20 15:02:45.204342",
"last_deep_scrub": "127'36714",
"last_deep_scrub_stamp": "2019-02-16
07:55:15.205861",
"last_clean_scrub_stamp": "2019-02-20
15:02:45.204342"
},
"stats": {
"version": "127'38077",
"reported_seq": "58934",
"reported_epoch": "162",
"state": "active+clean",
"last_fresh": "2019-02-20 19:56:56.740536",
"last_change": "2019-02-20 19:52:27.063812",
"last_active": "2019-02-20 19:56:56.740536",
"last_peered": "2019-02-20 19:56:56.740536",
"last_clean": "2019-02-20 19:56:56.740536",
"last_became_active": "2019-02-20 19:52:27.062689",
"last_became_peered": "2019-02-20 19:52:27.062689",
"last_unstale": "2019-02-20 19:56:56.740536",
"last_undegraded": "2019-02-20 19:56:56.740536",
"last_fullsized": "2019-02-20 19:56:56.740536",
"mapping_epoch": 158,
"log_start": "127'35000",
"ondisk_log_start": "127'35000",
"created": 10,
"last_epoch_clean": 159,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "127'36909",
"last_scrub_stamp": "2019-02-20 15:02:45.204342",
"last_deep_scrub": "127'36714",
"last_deep_scrub_stamp": "2019-02-16
07:55:15.205861",
"last_clean_scrub_stamp": "2019-02-20
15:02:45.204342",
"log_size": 3077,
"ondisk_log_size": 3077,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"manifest_stats_invalid": true,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 478347970,
"num_objects": 12052,
"num_object_clones": 0,
"num_object_copies": 24104,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 12052,
"num_whiteouts": 0,
"num_read": 20186,
"num_read_kb": 1952018,
"num_write": 38927,
"num_write_kb": 484756,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 6,
"num_bytes_recovered": 4101,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0,
"num_large_omap_objects": 0,
"num_objects_manifest": 0
},
"up": [
0,
1
],
"acting": [
0,
1
],
"blocked_by": [],
"up_primary": 0,
"acting_primary": 0,
"purged_snaps": []
},
"empty": 0,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 159,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
},
"peer_info": [
{
"peer": "1",
"pgid": "1.49",
"last_update": "127'38077",
"last_complete": "127'38077",
"log_tail": "127'35000",
"last_user_version": 38077,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [],
"history": {
"epoch_created": 10,
"epoch_pool_created": 10,
"last_epoch_started": 159,
"last_interval_started": 158,
"last_epoch_clean": 159,
"last_interval_clean": 158,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 158,
"same_interval_since": 158,
"same_primary_since": 135,
"last_scrub": "127'36909",
"last_scrub_stamp": "2019-02-20 15:02:45.204342",
"last_deep_scrub": "127'36714",
"last_deep_scrub_stamp": "2019-02-16
07:55:15.205861",
"last_clean_scrub_stamp": "2019-02-20
15:02:45.204342"
},
"stats": {
"version": "127'38077",
"reported_seq": "58745",
"reported_epoch": "134",
"state": "active+undersized+degraded",
"last_fresh": "2019-02-20 19:06:19.180016",
"last_change": "2019-02-20 19:04:39.483332",
"last_active": "2019-02-20 19:06:19.180016",
"last_peered": "2019-02-20 19:06:19.180016",
"last_clean": "2019-02-20 18:23:33.675145",
"last_became_active": "2019-02-20
19:04:39.483332",
"last_became_peered": "2019-02-20
19:04:39.483332",
"last_unstale": "2019-02-20 19:06:19.180016",
"last_undegraded": "2019-02-20 19:04:39.477829",
"last_fullsized": "2019-02-20 19:04:39.477717",
"mapping_epoch": 158,
"log_start": "127'35000",
"ondisk_log_start": "127'35000",
"created": 10,
"last_epoch_clean": 124,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "127'36909",
"last_scrub_stamp": "2019-02-20 15:02:45.204342",
"last_deep_scrub": "127'36714",
"last_deep_scrub_stamp": "2019-02-16
07:55:15.205861",
"last_clean_scrub_stamp": "2019-02-20
15:02:45.204342",
"log_size": 3077,
"ondisk_log_size": 3077,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"manifest_stats_invalid": true,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 478347970,
"num_objects": 12052,
"num_object_clones": 0,
"num_object_copies": 24104,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 12052,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 12052,
"num_whiteouts": 0,
"num_read": 20186,
"num_read_kb": 1952018,
"num_write": 38927,
"num_write_kb": 484756,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 6,
"num_bytes_recovered": 4101,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0,
"num_large_omap_objects": 0,
"num_objects_manifest": 0
},
"up": [
0,
1
],
"acting": [
0,
1
],
"blocked_by": [],
"up_primary": 0,
"acting_primary": 0,
"purged_snaps": []
},
"empty": 0,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 159,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
}
],
"recovery_state": [
{
"name": "Started/Primary/Active",
"enter_time": "2019-02-20 19:52:27.027151",
"might_have_unfound": [],
"recovery_progress": {
"backfill_targets": [],
"waiting_on_backfill": [],
"last_backfill_started": "MIN",
"backfill_info": {
"begin": "MIN",
"end": "MIN",
"objects": []
},
"peer_backfill_info": [],
"backfills_in_flight": [],
"recovering": [],
"pg_backend": {
"pull_from_peer": [],
"pushing": []
}
},
"scrub": {
"scrubber.epoch_start": "0",
"scrubber.active": false,
"scrubber.state": "INACTIVE",
"scrubber.start": "MIN",
"scrubber.end": "MIN",
"scrubber.max_end": "MIN",
"scrubber.subset_last_update": "0'0",
"scrubber.deep": false,
"scrubber.waiting_on_whom": []
}
},
{
"name": "Started",
"enter_time": "2019-02-20 19:52:25.976144"
}
],
"agent_state": {}
}
I wonder what it all means and how to get out of this
situation. The cluster seems to work normally. But it's quite
disconcerting as you can probably imagine. Could it be a
firewall issue? I'm not aware of any changes and I don't see
any peering problems...
Thank you
Ranjan
_______________________________________________
ceph-users mailing list
[email protected] <mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com