Re: [ceph-users] Can't start osd- one osd alway be down.

tuantb Tue, 28 Oct 2014 03:05:42 -0700

 

Hi Craig Lewis,


My pool have 300TB DATA, I can't recreate a new pool, then copying data
by "ceph cp pool" (take very long time). 

I upgraded Ceph to Giant (0.86), but still error :(( 

I think my proplem is "objects misplaced (0.320%)" 

# ceph pg 23.96 query
 "num_objects_missing_on_primary": 0,
 "num_objects_degraded": 0,
 "NUM_OBJECTS_MISPLACED": 79,

 cluster xxxxxx-xxxxx-xxxxx-xxxxx
 health HEALTH_WARN 225 pgs degraded; 2 pgs repair; 225 pgs stuck
degraded; 263 pgs stuck unclean; 225 pgs stuck undersized; 225 pgs
undersized; recovery 308759/54799506 objects degraded (0.563%);
175270/54799506 objects misplaced (0.320%); 1/130 in osds are down; 
 flags noout,nodeep-scrub
 pgmap v28905830: 14973 pgs, 23 pools, 70255 GB data, 17838 kobjects
 206 TB used, 245 TB / 452 TB avail
 308759/54799506 objects degraded (0.563%); 175270/54799506 OBJECTS
MISPLACED (0.320%)
 14708 active+clean
 38 ACTIVE+REMAPPED
 225 ACTIVE+UNDERSIZED+DEGRADED    
 client io 35068 kB/s rd, 71815 kB/s wr, 4956 op/s 

- Checking in ceph log: 

2014-10-28 15:33:59.733177 7f6a7f1ab700 5 OSD.21 pg_epoch: 103718
pg[23.96( v 103713'171086 (103609'167229,103713'171086] local-les=103715
n=85 ec=25000 les/c 103715/103710 103714/103714/103236) [92,21,78] r=1
lpr=103714 pi=100280-103713/118 luod=0'0 crt=103713'171086 active] ENTER
STARTED/REPLICAACTIVE/REPNOTRECOVERING 

Then logging many failed log: (on many objects eg:
c03fe096/rbd_data.5348922ae8944a.000000000000306b,..) 

2014-10-28 15:33:59.343435 7f6a7e1a9700 5 -- op tracker -- seq: 1793,
time: 2014-10-28 15:33:59.343435, event: done, op: MOSDPGPush(23.96
103718
[PushOp(C03FE096/RBD_DATA.5348922AE8944A.000000000000306B/HEAD//24,
version: 103622'283374, data_included: [0~4194304], data_size: 0,
omap_header_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(c03fe096/rbd_data.5348922ae8944a.000000000000306b/head//24@103622'283374,
copy_subset: [0~4194304], clone_subset: {}), after_progress:
ObjectRecoveryProgress(!first, data_recovered_to:4194304,
data_complete:true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0,
data_complete:false, omap_recovered_to:,
omap_complete:false)),PushOp(4120f096/rbd_data.7a63d32ae8944a.0000000000000083/head//24,
version: 103679'295624, data_included: [0~4194304], data_size: 0,
omap_header_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(4120f096/rbd_data.7a63d32ae8944a.0000000000000083/head//24@103679'295624,
copy_subset: [0~4194304], clone_subset: {}), after_progress:
ObjectRecoveryProgress(!first, data_recovered_to:4194304,
data_complete:true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0,
data_complete:false, omap_recovered_to:, omap_complete:false))]) 

Thanks! 

--
Tuan
HaNoi-VietNam 

On 2014-10-28 01:35, Craig Lewis wrote: 

> My experience is that once you hit this bug, those PGs are gone. I tried 
> marking the primary OSD OUT, which caused this problem to move to the new 
> primary OSD. Luckily for me, my affected PGs were using replication state in 
> the secondary cluster. I ended up deleting the whole pool and recreating it. 
> 
> Which pools are 7 and 23? It's possible that it's something that easy to 
> replace. 
> 
> On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan <[email protected]> wrote:
> 
>> Hi Craig, Thanks for replying.
>> When i started that osd, Ceph Log from "ceph -w" warns pgs 7.9d8 23.596, 
>> 23.9c6, 23.63 can't recovery as pasted log.
>> 
>> Those pgs are "active+degraded" state. 
>> #ceph pg map 7.9d8 
>> osdmap e102808 pg 7.9d8 (7.9d8) -> up [93,49] acting [93,49] (When start 
>> osd.21 then pg 7.9d8 and three remain pgs to changed to state 
>> "active+recovering") . osd.21 still down after following logs:

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Can't start osd- one osd alway be down.

Reply via email to