Hi, You have problems with MRG. http://docs.ceph.com/docs/master/rados/operations/pg-states/ *The ceph-mgr hasn’t yet received any information about the PG’s state from an OSD since mgr started up.*
чт, 21 февр. 2019 г. в 09:04, Irek Fasikhov <[email protected]>: > Hi, > > You have problems with MRG. > http://docs.ceph.com/docs/master/rados/operations/pg-states/ > *The ceph-mgr hasn’t yet received any information about the PG’s state > from an OSD since mgr started up.* > > > ср, 20 февр. 2019 г. в 23:10, Ranjan Ghosh <[email protected]>: > >> Hi all, >> >> hope someone can help me. After restarting a node of my 2-node-cluster >> suddenly I get this: >> >> root@yak2 /var/www/projects # ceph -s >> cluster: >> id: 749b2473-9300-4535-97a6-ee6d55008a1b >> health: HEALTH_WARN >> Reduced data availability: 200 pgs inactive >> >> services: >> mon: 3 daemons, quorum yak1,yak2,yak0 >> mgr: yak0.planwerk6.de(active), standbys: yak1.planwerk6.de, >> yak2.planwerk6.de >> mds: cephfs-1/1/1 up {0=yak1.planwerk6.de=up:active}, 1 up:standby >> osd: 2 osds: 2 up, 2 in >> >> data: >> pools: 2 pools, 200 pgs >> objects: 0 objects, 0 B >> usage: 0 B used, 0 B / 0 B avail >> pgs: 100.000% pgs unknown >> 200 unknown >> >> And this: >> >> >> root@yak2 /var/www/projects # ceph health detail >> HEALTH_WARN Reduced data availability: 200 pgs inactive >> PG_AVAILABILITY Reduced data availability: 200 pgs inactive >> pg 1.34 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.35 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.36 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.37 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.38 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.39 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3a is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3b is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3c is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3d is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3e is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.3f is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.40 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.41 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.42 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.43 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.44 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.45 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.46 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.47 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.48 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.49 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.4a is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.4b is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.4c is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 1.4d is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.34 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.35 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.36 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.38 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.39 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.3a is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.3b is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.3c is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.3d is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.3e is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.3f is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.40 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.41 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.42 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.43 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.44 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.45 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.46 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.47 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.48 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.49 is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.4a is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.4b is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.4e is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> pg 2.4f is stuck inactive for 3506.815664, current state unknown, >> last acting [] >> >> But if I query an individual PG I get this: >> >> root@yak1 /var/www/projects # ceph pg 1.49 query >> { >> "state": "active+clean", >> "snap_trimq": "[]", >> "snap_trimq_len": 0, >> "epoch": 162, >> "up": [ >> 0, >> 1 >> ], >> "acting": [ >> 0, >> 1 >> ], >> "acting_recovery_backfill": [ >> "0", >> "1" >> ], >> "info": { >> "pgid": "1.49", >> "last_update": "127'38077", >> "last_complete": "127'38077", >> "log_tail": "127'35000", >> "last_user_version": 38077, >> "last_backfill": "MAX", >> "last_backfill_bitwise": 0, >> "purged_snaps": [], >> "history": { >> "epoch_created": 10, >> "epoch_pool_created": 10, >> "last_epoch_started": 159, >> "last_interval_started": 158, >> "last_epoch_clean": 159, >> "last_interval_clean": 158, >> "last_epoch_split": 0, >> "last_epoch_marked_full": 0, >> "same_up_since": 158, >> "same_interval_since": 158, >> "same_primary_since": 135, >> "last_scrub": "127'36909", >> "last_scrub_stamp": "2019-02-20 15:02:45.204342", >> "last_deep_scrub": "127'36714", >> "last_deep_scrub_stamp": "2019-02-16 07:55:15.205861", >> "last_clean_scrub_stamp": "2019-02-20 15:02:45.204342" >> }, >> "stats": { >> "version": "127'38077", >> "reported_seq": "58934", >> "reported_epoch": "162", >> "state": "active+clean", >> "last_fresh": "2019-02-20 19:56:56.740536", >> "last_change": "2019-02-20 19:52:27.063812", >> "last_active": "2019-02-20 19:56:56.740536", >> "last_peered": "2019-02-20 19:56:56.740536", >> "last_clean": "2019-02-20 19:56:56.740536", >> "last_became_active": "2019-02-20 19:52:27.062689", >> "last_became_peered": "2019-02-20 19:52:27.062689", >> "last_unstale": "2019-02-20 19:56:56.740536", >> "last_undegraded": "2019-02-20 19:56:56.740536", >> "last_fullsized": "2019-02-20 19:56:56.740536", >> "mapping_epoch": 158, >> "log_start": "127'35000", >> "ondisk_log_start": "127'35000", >> "created": 10, >> "last_epoch_clean": 159, >> "parent": "0.0", >> "parent_split_bits": 0, >> "last_scrub": "127'36909", >> "last_scrub_stamp": "2019-02-20 15:02:45.204342", >> "last_deep_scrub": "127'36714", >> "last_deep_scrub_stamp": "2019-02-16 07:55:15.205861", >> "last_clean_scrub_stamp": "2019-02-20 15:02:45.204342", >> "log_size": 3077, >> "ondisk_log_size": 3077, >> "stats_invalid": false, >> "dirty_stats_invalid": false, >> "omap_stats_invalid": false, >> "hitset_stats_invalid": false, >> "hitset_bytes_stats_invalid": false, >> "pin_stats_invalid": false, >> "manifest_stats_invalid": true, >> "snaptrimq_len": 0, >> "stat_sum": { >> "num_bytes": 478347970, >> "num_objects": 12052, >> "num_object_clones": 0, >> "num_object_copies": 24104, >> "num_objects_missing_on_primary": 0, >> "num_objects_missing": 0, >> "num_objects_degraded": 0, >> "num_objects_misplaced": 0, >> "num_objects_unfound": 0, >> "num_objects_dirty": 12052, >> "num_whiteouts": 0, >> "num_read": 20186, >> "num_read_kb": 1952018, >> "num_write": 38927, >> "num_write_kb": 484756, >> "num_scrub_errors": 0, >> "num_shallow_scrub_errors": 0, >> "num_deep_scrub_errors": 0, >> "num_objects_recovered": 6, >> "num_bytes_recovered": 4101, >> "num_keys_recovered": 0, >> "num_objects_omap": 0, >> "num_objects_hit_set_archive": 0, >> "num_bytes_hit_set_archive": 0, >> "num_flush": 0, >> "num_flush_kb": 0, >> "num_evict": 0, >> "num_evict_kb": 0, >> "num_promote": 0, >> "num_flush_mode_high": 0, >> "num_flush_mode_low": 0, >> "num_evict_mode_some": 0, >> "num_evict_mode_full": 0, >> "num_objects_pinned": 0, >> "num_legacy_snapsets": 0, >> "num_large_omap_objects": 0, >> "num_objects_manifest": 0 >> }, >> "up": [ >> 0, >> 1 >> ], >> "acting": [ >> 0, >> 1 >> ], >> "blocked_by": [], >> "up_primary": 0, >> "acting_primary": 0, >> "purged_snaps": [] >> }, >> "empty": 0, >> "dne": 0, >> "incomplete": 0, >> "last_epoch_started": 159, >> "hit_set_history": { >> "current_last_update": "0'0", >> "history": [] >> } >> }, >> "peer_info": [ >> { >> "peer": "1", >> "pgid": "1.49", >> "last_update": "127'38077", >> "last_complete": "127'38077", >> "log_tail": "127'35000", >> "last_user_version": 38077, >> "last_backfill": "MAX", >> "last_backfill_bitwise": 0, >> "purged_snaps": [], >> "history": { >> "epoch_created": 10, >> "epoch_pool_created": 10, >> "last_epoch_started": 159, >> "last_interval_started": 158, >> "last_epoch_clean": 159, >> "last_interval_clean": 158, >> "last_epoch_split": 0, >> "last_epoch_marked_full": 0, >> "same_up_since": 158, >> "same_interval_since": 158, >> "same_primary_since": 135, >> "last_scrub": "127'36909", >> "last_scrub_stamp": "2019-02-20 15:02:45.204342", >> "last_deep_scrub": "127'36714", >> "last_deep_scrub_stamp": "2019-02-16 07:55:15.205861", >> "last_clean_scrub_stamp": "2019-02-20 15:02:45.204342" >> }, >> "stats": { >> "version": "127'38077", >> "reported_seq": "58745", >> "reported_epoch": "134", >> "state": "active+undersized+degraded", >> "last_fresh": "2019-02-20 19:06:19.180016", >> "last_change": "2019-02-20 19:04:39.483332", >> "last_active": "2019-02-20 19:06:19.180016", >> "last_peered": "2019-02-20 19:06:19.180016", >> "last_clean": "2019-02-20 18:23:33.675145", >> "last_became_active": "2019-02-20 19:04:39.483332", >> "last_became_peered": "2019-02-20 19:04:39.483332", >> "last_unstale": "2019-02-20 19:06:19.180016", >> "last_undegraded": "2019-02-20 19:04:39.477829", >> "last_fullsized": "2019-02-20 19:04:39.477717", >> "mapping_epoch": 158, >> "log_start": "127'35000", >> "ondisk_log_start": "127'35000", >> "created": 10, >> "last_epoch_clean": 124, >> "parent": "0.0", >> "parent_split_bits": 0, >> "last_scrub": "127'36909", >> "last_scrub_stamp": "2019-02-20 15:02:45.204342", >> "last_deep_scrub": "127'36714", >> "last_deep_scrub_stamp": "2019-02-16 07:55:15.205861", >> "last_clean_scrub_stamp": "2019-02-20 15:02:45.204342", >> "log_size": 3077, >> "ondisk_log_size": 3077, >> "stats_invalid": false, >> "dirty_stats_invalid": false, >> "omap_stats_invalid": false, >> "hitset_stats_invalid": false, >> "hitset_bytes_stats_invalid": false, >> "pin_stats_invalid": false, >> "manifest_stats_invalid": true, >> "snaptrimq_len": 0, >> "stat_sum": { >> "num_bytes": 478347970, >> "num_objects": 12052, >> "num_object_clones": 0, >> "num_object_copies": 24104, >> "num_objects_missing_on_primary": 0, >> "num_objects_missing": 0, >> "num_objects_degraded": 12052, >> "num_objects_misplaced": 0, >> "num_objects_unfound": 0, >> "num_objects_dirty": 12052, >> "num_whiteouts": 0, >> "num_read": 20186, >> "num_read_kb": 1952018, >> "num_write": 38927, >> "num_write_kb": 484756, >> "num_scrub_errors": 0, >> "num_shallow_scrub_errors": 0, >> "num_deep_scrub_errors": 0, >> "num_objects_recovered": 6, >> "num_bytes_recovered": 4101, >> "num_keys_recovered": 0, >> "num_objects_omap": 0, >> "num_objects_hit_set_archive": 0, >> "num_bytes_hit_set_archive": 0, >> "num_flush": 0, >> "num_flush_kb": 0, >> "num_evict": 0, >> "num_evict_kb": 0, >> "num_promote": 0, >> "num_flush_mode_high": 0, >> "num_flush_mode_low": 0, >> "num_evict_mode_some": 0, >> "num_evict_mode_full": 0, >> "num_objects_pinned": 0, >> "num_legacy_snapsets": 0, >> "num_large_omap_objects": 0, >> "num_objects_manifest": 0 >> }, >> "up": [ >> 0, >> 1 >> ], >> "acting": [ >> 0, >> 1 >> ], >> "blocked_by": [], >> "up_primary": 0, >> "acting_primary": 0, >> "purged_snaps": [] >> }, >> "empty": 0, >> "dne": 0, >> "incomplete": 0, >> "last_epoch_started": 159, >> "hit_set_history": { >> "current_last_update": "0'0", >> "history": [] >> } >> } >> ], >> "recovery_state": [ >> { >> "name": "Started/Primary/Active", >> "enter_time": "2019-02-20 19:52:27.027151", >> "might_have_unfound": [], >> "recovery_progress": { >> "backfill_targets": [], >> "waiting_on_backfill": [], >> "last_backfill_started": "MIN", >> "backfill_info": { >> "begin": "MIN", >> "end": "MIN", >> "objects": [] >> }, >> "peer_backfill_info": [], >> "backfills_in_flight": [], >> "recovering": [], >> "pg_backend": { >> "pull_from_peer": [], >> "pushing": [] >> } >> }, >> "scrub": { >> "scrubber.epoch_start": "0", >> "scrubber.active": false, >> "scrubber.state": "INACTIVE", >> "scrubber.start": "MIN", >> "scrubber.end": "MIN", >> "scrubber.max_end": "MIN", >> "scrubber.subset_last_update": "0'0", >> "scrubber.deep": false, >> "scrubber.waiting_on_whom": [] >> } >> }, >> { >> "name": "Started", >> "enter_time": "2019-02-20 19:52:25.976144" >> } >> ], >> "agent_state": {} >> } >> >> I wonder what it all means and how to get out of this situation. The >> cluster seems to work normally. But it's quite disconcerting as you can >> probably imagine. Could it be a firewall issue? I'm not aware of any >> changes and I don't see any peering problems... >> >> Thank you >> >> Ranjan >> >> >> >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
