Re: [ceph-users] PG's Degraded on disk failure not remapped.

Daniel Manzau Tue, 04 Aug 2015 05:08:59 -0700

Hi,

Yes we have been following Sebastien's SSD HDD mix blog which seems to be
working ok. So 2 hosts with SSD and HDD on each.


We aren't setting "osd pool default min size" and it's currently reporting
as 0

ceph --admin-daemon /var/run/ceph/ceph-osd.12.asok config show | grep
osd_pool_default_min
   "osd_pool_default_min_size": "0",

 I don't think it's totally stuck as if we take the osd out of the pool it
rebalances correctly.




-----Original Message-----
From: Christian Balzer [mailto:ch...@gol.com]
Sent: Tuesday, 4 August 2015 9:21 PM
To: ceph-users@lists.ceph.com
Cc: Daniel Manzau
Subject: Re: [ceph-users] PG's Degraded on disk failure not remapped.


Hello,

On Tue, 4 Aug 2015 20:33:58 +1000 Daniel Manzau wrote:

> Hi Christian,
>
> True it's not exactly out of the box. Here is the ceph.conf.
>
Crush rule file and a description (are those 4 hosts or are the HDD and
SSD shared on the same HW as your pool size suggests), etc etc.
My guess is you're following Sebastien's blog entry on how to mix things
on the same host.

> Could it be the " osd crush update on start = false" stopping the
> remapping of a disk on failure?
>
Doubt it, that would be a pretty significant bug.

OTOH, is your "osd_pool_default_size = 2" matched by a "osd pool default
min size = 1" ?

As in, is your cluster (or at least the pool using SSDs) totally stuck at
this point?

Christian
>
> [global]
> fsid = bfb7e666-f66d-45c0-b4fc-b98182fed666
> mon_initial_members = ceph-store1, ceph-store2, ceph-admin1 mon_host =
> 10.66.8.2,10.66.8.3,10.66.8.1 auth_cluster_required = cephx
> auth_service_required = cephx auth_client_required = cephx
> filestore_xattr_use_omap = true osd_pool_default_size = 2
> public_network = 10.66.8.0/23 cluster network = 10.66.16.0/23
>
> [osd]
> osd crush update on start = false
> osd_max_backfills = 2
> osd_recovery_op_priority = 2
> osd_recovery_max_active = 2
> osd_recovery_max_chunk = 4194304
>
> [client]
> rbd cache = true
> rbd cache writethrough until flush = true admin socket =
> /var/run/ceph/rbd-client-$pid.asok
>
>
> Regards,
> Daniel
>
> -----Original Message-----
> From: Christian Balzer [mailto:ch...@gol.com]
> Sent: Tuesday, 4 August 2015 3:47 PM
> To: Daniel Manzau
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] PG's Degraded on disk failure not remapped.
>
>
> Hello,
>
> There's a number of reasons I can think of why this would happen.
> You say "default behavior" but looking at your map it's obvious that
> you probably don't have a default cluster and crush map.
> Your ceph.conf may help, too.
>
> Regards,
>
> Christian
> On Tue, 4 Aug 2015 13:05:54 +1000 Daniel Manzau wrote:
>
> > Hi Cephers,
> >
> > We've been testing drive failures and we're just trying to see if
> > the behaviour of our cluster is normal,  or if we've setup something
wrong.
> >
> > In summary; the OSD is down and out, but the PGs are showing as
> > degraded and don't seem to want to remap. We'd have assumed once the
> > OSD was marked out, that a re-map should have happened and we'd see
> > misplaced rather than degraded PGs.
> >
> >   cluster bfb7e824-f37d-45c0-a4fc-a98182fed985
> >      health HEALTH_WARN
> >             43 pgs degraded
> >             43 pgs stuck degraded
> >             44 pgs stuck unclean
> >             43 pgs stuck undersized
> >             43 pgs undersized
> >             recovery 36899/6822836 objects degraded (0.541%)
> >             recovery 813/6822836 objects misplaced (0.012%)
> >      monmap e3: 3 mons at
> >
>
{ceph-admin1=10.66.8.1:6789/0,ceph-store1=10.66.8.2:6789/0,ceph-store2=10.
> > 66.8.3:6789/0}
> >             election epoch 950, quorum 0,1,2
> > ceph-admin1,ceph-store1,ceph-store2
> >      osdmap e6342: 36 osds: 35 up, 35 in; 1 remapped pgs
> >       pgmap v11805515: 1700 pgs, 3 pools, 13165 GB data, 3331 kobjects
> >             25941 GB used, 30044 GB / 55986 GB avail
> >             36899/6822836 objects degraded (0.541%)
> >             813/6822836 objects misplaced (0.012%)
> >                 1656 active+clean
> >                   43 active+undersized+degraded
> >                    1 active+remapped
> >   client io 491 kB/s rd, 3998 kB/s wr, 480 op/s
> >
> >
> > # id        weight  type name       up/down reweight
> > -6  43.56   root hdd
> > -2  21.78           host ceph-store1-hdd
> > 0   3.63                    osd.0   up      1
> > 2   3.63                    osd.2   up      1
> > 4   3.63                    osd.4   up      1
> > 6   3.63                    osd.6   up      1
> > 8   3.63                    osd.8   up      1
> > 10  3.63                    osd.10  up      1
> > -3  21.78           host ceph-store2-hdd
> > 1   3.63                    osd.1   up      1
> > 3   3.63                    osd.3   up      1
> > 5   3.63                    osd.5   up      1
> > 7   3.63                    osd.7   up      1
> > 9   3.63                    osd.9   up      1
> > 11  3.63                    osd.11  up      1
> > -1  11.48   root ssd
> > -4  5.74            host ceph-store1-ssd
> > 12  0.43                    osd.12  up      1
> > 13  0.43                    osd.13  up      1
> > 14  0.43                    osd.14  up      1
> > 16  0.43                    osd.16  up      1
> > 18  0.43                    osd.18  down    0
> > 19  0.43                    osd.19  up      1
> > 20  0.43                    osd.20  up      1
> > 21  0.43                    osd.21  up      1
> > 32  0.72                    osd.32  up      1
> > 33  0.72                    osd.33  up      1
> > 17  0.43                    osd.17  up      1
> > 15  0.43                    osd.15  up      1
> > -5  5.74            host ceph-store2-ssd
> > 22  0.43                    osd.22  up      1
> > 23  0.43                    osd.23  up      1
> > 24  0.43                    osd.24  up      1
> > 25  0.43                    osd.25  up      1
> > 26  0.43                    osd.26  up      1
> > 27  0.43                    osd.27  up      1
> > 28  0.43                    osd.28  up      1
> > 29  0.43                    osd.29  up      1
> > 30  0.43                    osd.30  up      1
> > 31  0.43                    osd.31  up      1
> > 34  0.72                    osd.34  up      1
> > 35  0.72                    osd.35  up      1
> >
> > Are we misunderstanding the default behaviour? Any help you can
> > provide will be very much appreciated.
> >
> > Regards,
> > Daniel
> >
> > W: www.3ca.com.au
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>


-- 
Christian Balzer        Network/Systems Engineer
ch...@gol.com           Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG's Degraded on disk failure not remapped.

Reply via email to