Hi Craig Lewis,
My pool have 300TB DATA, I can't recreate a new pool, then copying data
by "ceph cp pool" (take very long time).
I upgraded Ceph to Giant (0.86), but still error :((
I think my proplem is "objects misplaced (0.320%)"
# ceph pg 23.96 query
"num_objects_missing_on_primary": 0,
"num_objects_degraded": 0,
"NUM_OBJECTS_MISPLACED": 79,
cluster xxxxxx-xxxxx-xxxxx-xxxxx
health HEALTH_WARN 225 pgs degraded; 2 pgs repair; 225 pgs stuck
degraded; 263 pgs stuck unclean; 225 pgs stuck undersized; 225 pgs
undersized; recovery 308759/54799506 objects degraded (0.563%);
175270/54799506 objects misplaced (0.320%); 1/130 in osds are down;
flags noout,nodeep-scrub
pgmap v28905830: 14973 pgs, 23 pools, 70255 GB data, 17838 kobjects
206 TB used, 245 TB / 452 TB avail
308759/54799506 objects degraded (0.563%); 175270/54799506 OBJECTS
MISPLACED (0.320%)
14708 active+clean
38 ACTIVE+REMAPPED
225 ACTIVE+UNDERSIZED+DEGRADED
client io 35068 kB/s rd, 71815 kB/s wr, 4956 op/s
- Checking in ceph log:
2014-10-28 15:33:59.733177 7f6a7f1ab700 5 OSD.21 pg_epoch: 103718
pg[23.96( v 103713'171086 (103609'167229,103713'171086] local-les=103715
n=85 ec=25000 les/c 103715/103710 103714/103714/103236) [92,21,78] r=1
lpr=103714 pi=100280-103713/118 luod=0'0 crt=103713'171086 active] ENTER
STARTED/REPLICAACTIVE/REPNOTRECOVERING
Then logging many failed log: (on many objects eg:
c03fe096/rbd_data.5348922ae8944a.000000000000306b,..)
2014-10-28 15:33:59.343435 7f6a7e1a9700 5 -- op tracker -- seq: 1793,
time: 2014-10-28 15:33:59.343435, event: done, op: MOSDPGPush(23.96
103718
[PushOp(C03FE096/RBD_DATA.5348922AE8944A.000000000000306B/HEAD//24,
version: 103622'283374, data_included: [0~4194304], data_size: 0,
omap_header_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(c03fe096/rbd_data.5348922ae8944a.000000000000306b/head//24@103622'283374,
copy_subset: [0~4194304], clone_subset: {}), after_progress:
ObjectRecoveryProgress(!first, data_recovered_to:4194304,
data_complete:true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0,
data_complete:false, omap_recovered_to:,
omap_complete:false)),PushOp(4120f096/rbd_data.7a63d32ae8944a.0000000000000083/head//24,
version: 103679'295624, data_included: [0~4194304], data_size: 0,
omap_header_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(4120f096/rbd_data.7a63d32ae8944a.0000000000000083/head//24@103679'295624,
copy_subset: [0~4194304], clone_subset: {}), after_progress:
ObjectRecoveryProgress(!first, data_recovered_to:4194304,
data_complete:true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0,
data_complete:false, omap_recovered_to:, omap_complete:false))])
Thanks!
--
Tuan
HaNoi-VietNam
On 2014-10-28 01:35, Craig Lewis wrote:
> My experience is that once you hit this bug, those PGs are gone. I tried
> marking the primary OSD OUT, which caused this problem to move to the new
> primary OSD. Luckily for me, my affected PGs were using replication state in
> the secondary cluster. I ended up deleting the whole pool and recreating it.
>
> Which pools are 7 and 23? It's possible that it's something that easy to
> replace.
>
> On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan <[email protected]> wrote:
>
>> Hi Craig, Thanks for replying.
>> When i started that osd, Ceph Log from "ceph -w" warns pgs 7.9d8 23.596,
>> 23.9c6, 23.63 can't recovery as pasted log.
>>
>> Those pgs are "active+degraded" state.
>> #ceph pg map 7.9d8
>> osdmap e102808 pg 7.9d8 (7.9d8) -> up [93,49] acting [93,49] (When start
>> osd.21 then pg 7.9d8 and three remain pgs to changed to state
>> "active+recovering") . osd.21 still down after following logs:
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com