Re: [ceph-users] PG's Degraded on disk failure not remapped.

Daniel Manzau Tue, 04 Aug 2015 03:34:55 -0700

Hi Christian,

True it's not exactly out of the box. Here is the ceph.conf.


Could it be the " osd crush update on start = false" stopping the
remapping of a disk on failure?


[global]
fsid = bfb7e666-f66d-45c0-b4fc-b98182fed666
mon_initial_members = ceph-store1, ceph-store2, ceph-admin1
mon_host = 10.66.8.2,10.66.8.3,10.66.8.1
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
public_network = 10.66.8.0/23
cluster network = 10.66.16.0/23

[osd]
osd crush update on start = false
osd_max_backfills = 2
osd_recovery_op_priority = 2
osd_recovery_max_active = 2
osd_recovery_max_chunk = 4194304

[client]
rbd cache = true
rbd cache writethrough until flush = true
admin socket = /var/run/ceph/rbd-client-$pid.asok


Regards,
Daniel

-----Original Message-----
From: Christian Balzer [mailto:[email protected]]
Sent: Tuesday, 4 August 2015 3:47 PM
To: Daniel Manzau
Cc: [email protected]
Subject: Re: [ceph-users] PG's Degraded on disk failure not remapped.


Hello,

There's a number of reasons I can think of why this would happen.
You say "default behavior" but looking at your map it's obvious that you
probably don't have a default cluster and crush map.
Your ceph.conf may help, too.

Regards,

Christian
On Tue, 4 Aug 2015 13:05:54 +1000 Daniel Manzau wrote:

> Hi Cephers,
>
> We've been testing drive failures and we're just trying to see if the
> behaviour of our cluster is normal,  or if we've setup something wrong.
>
> In summary; the OSD is down and out, but the PGs are showing as
> degraded and don't seem to want to remap. We'd have assumed once the
> OSD was marked out, that a re-map should have happened and we'd see
> misplaced rather than degraded PGs.
>
>   cluster bfb7e824-f37d-45c0-a4fc-a98182fed985
>      health HEALTH_WARN
>             43 pgs degraded
>             43 pgs stuck degraded
>             44 pgs stuck unclean
>             43 pgs stuck undersized
>             43 pgs undersized
>             recovery 36899/6822836 objects degraded (0.541%)
>             recovery 813/6822836 objects misplaced (0.012%)
>      monmap e3: 3 mons at
>
{ceph-admin1=10.66.8.1:6789/0,ceph-store1=10.66.8.2:6789/0,ceph-store2=10.
> 66.8.3:6789/0}
>             election epoch 950, quorum 0,1,2
> ceph-admin1,ceph-store1,ceph-store2
>      osdmap e6342: 36 osds: 35 up, 35 in; 1 remapped pgs
>       pgmap v11805515: 1700 pgs, 3 pools, 13165 GB data, 3331 kobjects
>             25941 GB used, 30044 GB / 55986 GB avail
>             36899/6822836 objects degraded (0.541%)
>             813/6822836 objects misplaced (0.012%)
>                 1656 active+clean
>                   43 active+undersized+degraded
>                    1 active+remapped
>   client io 491 kB/s rd, 3998 kB/s wr, 480 op/s
>
>
> # id  weight  type name       up/down reweight
> -6    43.56   root hdd
> -2    21.78           host ceph-store1-hdd
> 0     3.63                    osd.0   up      1
> 2     3.63                    osd.2   up      1
> 4     3.63                    osd.4   up      1
> 6     3.63                    osd.6   up      1
> 8     3.63                    osd.8   up      1
> 10    3.63                    osd.10  up      1
> -3    21.78           host ceph-store2-hdd
> 1     3.63                    osd.1   up      1
> 3     3.63                    osd.3   up      1
> 5     3.63                    osd.5   up      1
> 7     3.63                    osd.7   up      1
> 9     3.63                    osd.9   up      1
> 11    3.63                    osd.11  up      1
> -1    11.48   root ssd
> -4    5.74            host ceph-store1-ssd
> 12    0.43                    osd.12  up      1
> 13    0.43                    osd.13  up      1
> 14    0.43                    osd.14  up      1
> 16    0.43                    osd.16  up      1
> 18    0.43                    osd.18  down    0
> 19    0.43                    osd.19  up      1
> 20    0.43                    osd.20  up      1
> 21    0.43                    osd.21  up      1
> 32    0.72                    osd.32  up      1
> 33    0.72                    osd.33  up      1
> 17    0.43                    osd.17  up      1
> 15    0.43                    osd.15  up      1
> -5    5.74            host ceph-store2-ssd
> 22    0.43                    osd.22  up      1
> 23    0.43                    osd.23  up      1
> 24    0.43                    osd.24  up      1
> 25    0.43                    osd.25  up      1
> 26    0.43                    osd.26  up      1
> 27    0.43                    osd.27  up      1
> 28    0.43                    osd.28  up      1
> 29    0.43                    osd.29  up      1
> 30    0.43                    osd.30  up      1
> 31    0.43                    osd.31  up      1
> 34    0.72                    osd.34  up      1
> 35    0.72                    osd.35  up      1
>
> Are we misunderstanding the default behaviour? Any help you can
> provide will be very much appreciated.
>
> Regards,
> Daniel
>
> W: www.3ca.com.au
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Christian Balzer        Network/Systems Engineer
[email protected]           Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG's Degraded on disk failure not remapped.

Reply via email to