I then follow someone's guidance, add 'mon compact on start = true' to the
config and restart one mon.
That mon has not joined the cluster until I added two mon deployed on
virtual machines with ssd into
the cluster.

And now the cluster is fine except the pg status.
[image: image.png]
[image: image.png]

Zhenshi Zhou <deader...@gmail.com> 于2020年10月29日周四 下午8:29写道:

> Hi,
>
> I was so anxious a few hours ago cause the sst files were growing so fast
> and I don't think
> the space on mon servers could afford it.
>
> Let me talk it from the beginning. I have a cluster with OSD deployed on
> SATA(7200rpm).
> 10T each OSD and I used ec pool for more space.I added new OSDs into the
> cluster last
> week and it has recovered well so far. After that, while the cluster is
> still recovering, I increased the pg_num.
> Besides that, the clients still write data to the server all the time.
>
> And the cluster became unhealthy last night. Some osds were down and one
> mon was down.
> Then I found the mon servers' root directories were lack of free space.
> The sst files in /var/lib/ceph/mon/ceph-xxx/store.db/
> were growing rapidly.
>
>
> Frank Schilder <fr...@dtu.dk> 于2020年10月29日周四 下午7:15写道:
>
>> I think you really need to sit down and explain the full story. Dropping
>> one-liners with new information will not work via e-mail.
>>
>> I have never heard of the problem you are facing, so you did something
>> that possibly no-one else has done before. Unless we know the full history
>> from the last time the cluster was health_ok until now, it will almost
>> certainly not be possible to figure out what is going on via e-mail.
>>
>> Usually, setting "norebalance" and "norecovery" should stop any recovery
>> IO and allow the PGs to peer. If they do not become active, something is
>> wrong and the information we got so far does not give a clue what this
>> could be.
>>
>> Please post the output of "ceph health detail", "ceph osd pool stats" and
>> "ceph osd pool ls detail" and a log of actions and results since last
>> health_ok status here, maybe it gives a clue what is going on.
>>
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> ________________________________________
>> From: Zhenshi Zhou <deader...@gmail.com>
>> Sent: 29 October 2020 09:44:14
>> To: Frank Schilder
>> Cc: ceph-users
>> Subject: Re: [ceph-users] monitor sst files continue growing
>>
>> I reset the pg_num after adding osd, it made some pg inactive(in
>> activating state)
>>
>> Frank Schilder <fr...@dtu.dk<mailto:fr...@dtu.dk>> 于2020年10月29日周四
>> 下午3:56写道:
>> This does not explain incomplete and inactive PGs. Are you hitting
>> https://tracker.ceph.com/issues/46847 (see also thread "Ceph does not
>> recover from OSD restart"? In that case, temporarily stopping and
>> restarting all new OSDs might help.
>>
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> ________________________________________
>> From: Zhenshi Zhou <deader...@gmail.com<mailto:deader...@gmail.com>>
>> Sent: 29 October 2020 08:30:25
>> To: Frank Schilder
>> Cc: ceph-users
>> Subject: Re: [ceph-users] monitor sst files continue growing
>>
>> After add OSDs into the cluster, the recovery and backfill progress has
>> not finished yet
>>
>> Zhenshi Zhou <deader...@gmail.com<mailto:deader...@gmail.com><mailto:
>> deader...@gmail.com<mailto:deader...@gmail.com>>> 于2020年10月29日周四
>> 下午3:29写道:
>> MGR is stopped by me cause it took too much memories.
>> For pg status, I added some OSDs in this cluster, and it
>>
>> Frank Schilder <fr...@dtu.dk<mailto:fr...@dtu.dk><mailto:fr...@dtu.dk
>> <mailto:fr...@dtu.dk>>> 于2020年10月29日周四 下午3:27写道:
>> Your problem is the overall cluster health. The MONs store cluster
>> history information that will be trimmed once it reaches HEALTH_OK.
>> Restarting the MONs only makes things worse right now. The health status is
>> a mess, no MGR, a bunch of PGs inactive, etc. This is what you need to
>> resolve. How did your cluster end up like this?
>>
>> It looks like all OSDs are up and in. You need to find out
>>
>> - why there are inactive PGs
>> - why there are incomplete PGs
>>
>> This usually happens when OSDs go missing.
>>
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> ________________________________________
>> From: Zhenshi Zhou <deader...@gmail.com<mailto:deader...@gmail.com
>> ><mailto:deader...@gmail.com<mailto:deader...@gmail.com>>>
>> Sent: 29 October 2020 07:37:19
>> To: ceph-users
>> Subject: [ceph-users] monitor sst files continue growing
>>
>> Hi all,
>>
>> My cluster is in wrong state. SST files in /var/lib/ceph/mon/xxx/store.db
>> continue growing. It claims mon are using a lot of disk space.
>>
>> I set "mon compact on start = true" and restart one of the monitors. But
>> it started and campacting for a long time, seems it has no end.
>>
>> [image.png]
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to