Re: [ceph-users] Ceph Recovery

Lazuardi Nasution Mon, 16 May 2016 22:34:13 -0700

Hi Wido,

The 75% happen on 4 nodes of 24 OSDs each with pool size of two and minimum
size of one. Any relation between this configuration and 75%?


Best regards,

On Tue, May 17, 2016 at 3:38 AM, Wido den Hollander <[email protected]> wrote:

>
> > Op 14 mei 2016 om 12:36 schreef Lazuardi Nasution <
> [email protected]>:
> >
> >
> > Hi Wido,
> >
> > Yes you are right. After removing the down OSDs, reformatting and bring
> > them up again, at least until 75% of total OSDs, my Ceph Cluster is
> healthy
> > again. It seem there is high probability of data safety if the total
> active
> > PGs same with total PGs and total degraded PGs same with total undersized
> > PGs, but it is better to check PGs one by one for make sure there is no
> > incomplete, unfound and/or missing objects.
> >
> > Anyway, why 75%? Can I reduce this value by resizing (add) the replica of
> > the pool?
> >
>
> It completely depends on the CRUSHMap how many OSDs have to be added back
> to allow the cluster to recover.
>
> A CRUSHmap has failure domains which is usually a host. You have to make
> sure you have enough 'hosts' online with OSDs for each replica.
>
> So with 3 replicas you need 3 hosts online with OSDs on there.
>
> You can lower the replica count of a pool (size), but that makes it more
> vulnerable to data loss.
>
> Wido
>
> > Best regards,
> >
> > On Fri, May 13, 2016 at 5:04 PM, Wido den Hollander <[email protected]>
> wrote:
> >
> > >
> > > > Op 13 mei 2016 om 11:55 schreef Lazuardi Nasution <
> > > [email protected]>:
> > > >
> > > >
> > > > Hi Wido,
> > > >
> > > > The status is same after 24 hour running. It seem that the status
> will
> > > not
> > > > go to fully active+clean until all down OSDs back again. The only
> way to
> > > > make down OSDs to go back again is reformating or replace if HDDs has
> > > > hardware issue. Do you think that it is safe way to do?
> > > >
> > >
> > > Ah, you are probably lacking enough replicas to make the recovery
> proceed.
> > >
> > > If that is needed I would do this OSD by OSD. Your crushmap will
> probably
> > > tell you which OSDs you need to bring back before it works again.
> > >
> > > Wido
> > >
> > > > Best regards,
> > > >
> > > > On Fri, May 13, 2016 at 4:44 PM, Wido den Hollander <[email protected]>
> > > wrote:
> > > >
> > > > >
> > > > > > Op 13 mei 2016 om 11:34 schreef Lazuardi Nasution <
> > > > > [email protected]>:
> > > > > >
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > After disaster and restarting for automatic recovery, I found
> > > following
> > > > > > ceph status. Some OSDs cannot be restarted due to file system
> > > corruption
> > > > > > (it seem that xfs is fragile).
> > > > > >
> > > > > > [root@management-b ~]# ceph status
> > > > > >     cluster 3810e9eb-9ece-4804-8c56-b986e7bb5627
> > > > > >      health HEALTH_WARN
> > > > > >             209 pgs degraded
> > > > > >             209 pgs stuck degraded
> > > > > >             334 pgs stuck unclean
> > > > > >             209 pgs stuck undersized
> > > > > >             209 pgs undersized
> > > > > >             recovery 5354/77810 objects degraded (6.881%)
> > > > > >             recovery 1105/77810 objects misplaced (1.420%)
> > > > > >      monmap e1: 3 mons at {management-a=
> > > > > >
> > > > >
> > >
> 10.255.102.1:6789/0,management-b=10.255.102.2:6789/0,management-c=10.255.102.3:6789/0
> > > > > > }
> > > > > >             election epoch 2308, quorum 0,1,2
> > > > > > management-a,management-b,management-c
> > > > > >      osdmap e25037: 96 osds: 49 up, 49 in; 125 remapped pgs
> > > > > >             flags sortbitwise
> > > > > >       pgmap v9024253: 2560 pgs, 5 pools, 291 GB data, 38905
> objects
> > > > > >             678 GB used, 90444 GB / 91123 GB avail
> > > > > >             5354/77810 objects degraded (6.881%)
> > > > > >             1105/77810 objects misplaced (1.420%)
> > > > > >                 2226 active+clean
> > > > > >                  209 active+undersized+degraded
> > > > > >                  125 active+remapped
> > > > > >   client io 0 B/s rd, 282 kB/s wr, 10 op/s
> > > > > >
> > > > > > Since total active PGs same with total PGs and total degraded PGs
> > > same
> > > > > with
> > > > > > total undersized PGs, does it mean that all PGs have at least one
> > > good
> > > > > > replica, so I can just mark lost or remove down OSD, reformat
> again
> > > and
> > > > > > then restart them if there is no hardware issue with HDDs? Which
> one
> > > of
> > > > > PGs
> > > > > > status should I pay more attention, degraded or undersized due to
> > > lost
> > > > > > object possibility?
> > > > > >
> > > > >
> > > > > Yes. Your system is not reporting any inactive, unfound or stale
> PGs,
> > > so
> > > > > that is good news.
> > > > >
> > > > > However, I recommend that you wait for the system to become fully
> > > > > active+clean before you start removing any OSDs or formatting hard
> > > drives.
> > > > > Better be safe than sorry.
> > > > >
> > > > > Wido
> > > > >
> > > > > > Best regards,
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list
> > > > > > [email protected]
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > >
> > >
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Recovery

Reply via email to