Re: [ceph-users] cluster is not stable

Zhenshi Zhou Thu, 14 Mar 2019 01:51:48 -0700

Hi huang,

I think I've found the root cause which make the monmap contains no
feature. As I moved the servers from one place to another, I modified
the monmap once.


However, not all monmap is the same on all mons. I modified monmap
on one of the mons, and create from scratch on the other two mons for
convenience. (ssh is disabled among the servers, and I don't wanna do
the modify operations 3 times)

As a result, the epoch number is not the same within the 3 mons. I think
this would be the root cause.

I have transferred the monmap which dumped from the leader mon, by
nc command, to other two mons and inject into the mon. The mon features
recover now. After a period of time on watching the cluster status, there
is
no "mark-down" operations on osd.

# ceph mon feature ls
all features
        supported: [kraken,luminous,mimic,osdmap-prune]
        persistent: [kraken,luminous,mimic,osdmap-prune]
on current monmap (epoch 3)
        persistent: [kraken,luminous,mimic,osdmap-prune]
        required: [kraken,luminous,mimic,osdmap-prune]

Thanks all your helps guys:)


Zhenshi Zhou <[email protected]> 于2019年3月14日周四 下午3:20写道：

> Hi,
>
> I'll try that command soon.
>
> It's a new cluster installed mimic. Not sure what the exact reason, but as
> far as I can think of, 2 things may cause this issue. One is that I moved
> these servers from a datacenter to this one, followed by steps [1].
> Another
> is that I create a bridge using the interface by which ceph connection
> used.
>
> [1]
> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-the-messy-way
>
>
> Thanks
>
> huang jun <[email protected]> 于2019年3月14日周四 下午3:04写道：
>
>> You can try that commands, but maybe you need to find the root cause
>> why the current monmap contains no features at all, do you upgrade
>> cluster from luminous to mimic,
>> or it's a new cluster installed mimic?
>>
>>
>> Zhenshi Zhou <[email protected]> 于2019年3月14日周四 下午2:37写道：
>> >
>> > Hi huang,
>> >
>> > It's a pre-production environment. If everything is fine, I'll use it
>> for production.
>> >
>> > My cluster is version mimic, should I set all features you listed in
>> the command?
>> >
>> > Thanks
>> >
>> > huang jun <[email protected]> 于2019年3月14日周四 下午2:11写道：
>> >>
>> >> sorry, the script should be
>> >> for f in kraken luminous mimic osdmap-prune; do
>> >>   ceph mon feature set $f --yes-i-really-mean-it
>> >> done
>> >>
>> >> huang jun <[email protected]> 于2019年3月14日周四 下午2:04写道：
>> >> >
>> >> > ok, if this is a **test environment**, you can try
>> >> > for f in 'kraken,luminous,mimic,osdmap-prune'; do
>> >> >   ceph mon feature set $f --yes-i-really-mean-it
>> >> > done
>> >> >
>> >> > If it is a production environment, you should eval the risk first,
>> and
>> >> > maybe setup a test cluster to testing first.
>> >> >
>> >> > Zhenshi Zhou <[email protected]> 于2019年3月14日周四 下午1:56写道：
>> >> > >
>> >> > > # ceph mon feature ls
>> >> > > all features
>> >> > >         supported: [kraken,luminous,mimic,osdmap-prune]
>> >> > >         persistent: [kraken,luminous,mimic,osdmap-prune]
>> >> > > on current monmap (epoch 2)
>> >> > >         persistent: [none]
>> >> > >         required: [none]
>> >> > >
>> >> > > huang jun <[email protected]> 于2019年3月14日周四 下午1:50写道：
>> >> > >>
>> >> > >> what's the output of 'ceph mon feature ls'?
>> >> > >>
>> >> > >> from the code, maybe mon features not contain luminous
>> >> > >> 6263 void OSD::send_beacon(const
>> ceph::coarse_mono_clock::time_point& now)
>> >> > >>
>> >> > >>  6264 {
>> >> > >>
>> >> > >>  6265   const auto& monmap = monc->monmap;
>> >> > >>
>> >> > >>  6266   // send beacon to mon even if we are just connected, and
>> the
>> >> > >> monmap is not
>> >> > >>
>> >> > >>  6267   // initialized yet by then.
>> >> > >>
>> >> > >>  6268   if (monmap.epoch > 0 &&
>> >> > >>
>> >> > >>  6269       monmap.get_required_features().contains_all(
>> >> > >>
>> >> > >>  6270         ceph::features::mon::FEATURE_LUMINOUS)) {
>> >> > >>
>> >> > >>  6271     dout(20) << __func__ << " sending" << dendl;
>> >> > >>
>> >> > >>  6272     MOSDBeacon* beacon = nullptr;
>> >> > >>
>> >> > >>  6273     {
>> >> > >>
>> >> > >>  6274       std::lock_guard l{min_last_epoch_clean_lock};
>> >> > >>
>> >> > >>  6275       beacon = new MOSDBeacon(osdmap->get_epoch(),
>> min_last_epoch_clean);
>> >> > >>
>> >> > >>  6276       std::swap(beacon->pgs, min_last_epoch_clean_pgs);
>> >> > >>
>> >> > >>  6277       last_sent_beacon = now;
>> >> > >>
>> >> > >>  6278     }
>> >> > >>
>> >> > >>  6279     monc->send_mon_message(beacon);
>> >> > >>
>> >> > >>  6280   } else {
>> >> > >>
>> >> > >>  6281     dout(20) << __func__ << " not sending" << dendl;
>> >> > >>
>> >> > >>  6282   }
>> >> > >>
>> >> > >>  6283 }
>> >> > >>
>> >> > >>
>> >> > >> Zhenshi Zhou <[email protected]> 于2019年3月14日周四 下午12:43写道：
>> >> > >> >
>> >> > >> > Hi,
>> >> > >> >
>> >> > >> > One of the log says the beacon not sending as below:
>> >> > >> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032
>> tick_without_osd_lock
>> >> > >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032
>> can_inc_scrubs_pending 0 -> 1 (max 1, active 0)
>> >> > >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032
>> scrub_time_permit should run between 0 - 24 now 12 = yes
>> >> > >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032
>> scrub_load_below_threshold loadavg per cpu 0 < max 0.5 = yes
>> >> > >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 sched_scrub
>> load_is_low=1
>> >> > >> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032 sched_scrub
>> 1.79 scheduled at 2019-03-14 13:17:51.290050 > 2019-03-14 12:41:15.723848
>> >> > >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 sched_scrub
>> done
>> >> > >> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032
>> promote_throttle_recalibrate 0 attempts, promoted 0 objects and 0 B; target
>> 25 obj/sec or 5 MiB/sec
>> >> > >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032
>> promote_throttle_recalibrate  new_prob 1000
>> >> > >> > 2019-03-14 12:41:15.722 7f3c27684700 10 osd.5 17032
>> promote_throttle_recalibrate  actual 0, actual/prob ratio 1, adjusted
>> new_prob 1000, prob 1000 -> 1000
>> >> > >> > 2019-03-14 12:41:15.722 7f3c27684700 20 osd.5 17032 send_beacon
>> not sending
>> >> > >> >
>> >> > >> >
>> >> > >> > huang jun <[email protected]> 于2019年3月14日周四 下午12:30写道：
>> >> > >> >>
>> >> > >> >> osd will not send beacons to mon if its not in ACTIVE state,
>> >> > >> >> so you maybe turn on one osd's debug_osd=20 to see what is
>> going on
>> >> > >> >>
>> >> > >> >> Zhenshi Zhou <[email protected]> 于2019年3月14日周四 上午11:07写道：
>> >> > >> >> >
>> >> > >> >> > What's more, I find that the osds don't send beacons all the
>> time, some osds send beacons
>> >> > >> >> > for a period of time and then stop sending beacons.
>> >> > >> >> >
>> >> > >> >> >
>> >> > >> >> >
>> >> > >> >> > Zhenshi Zhou <[email protected]> 于2019年3月14日周四 上午10:57写道：
>> >> > >> >> >>
>> >> > >> >> >> Hi
>> >> > >> >> >>
>> >> > >> >> >> I set the config on every osd and check whether all osds
>> send beacons
>> >> > >> >> >> to monitors.
>> >> > >> >> >>
>> >> > >> >> >> The result shows that only part of the osds send beacons
>> and the monitor
>> >> > >> >> >> receives all beacons from which the osd send out.
>> >> > >> >> >>
>> >> > >> >> >> But why some osds don't send beacon?
>> >> > >> >> >>
>> >> > >> >> >> huang jun <[email protected]> 于2019年3月13日周三 下午11:02写道：
>> >> > >> >> >>>
>> >> > >> >> >>> sorry for not make it clearly, you may need to set one of
>> your osd's
>> >> > >> >> >>> osd_beacon_report_interval = 5
>> >> > >> >> >>> and debug_ms=1 and then restart the osd process, then
>> check the osd
>> >> > >> >> >>> log by 'grep beacon /var/log/ceph/ceph-osd.$id.log'
>> >> > >> >> >>> to make sure osd send beacons to mon, if osd send beacon
>> to mon, you
>> >> > >> >> >>> should also turn on debug_ms=1 on leader mon,
>> >> > >> >> >>> and restart mon process, then check the mon log to make
>> sure mon
>> >> > >> >> >>> received osd beacon;
>> >> > >> >> >>>
>> >> > >> >> >>> Zhenshi Zhou <[email protected]> 于2019年3月13日周三 下午8:20写道：
>> >> > >> >> >>> >
>> >> > >> >> >>> > And now, new errors are cliaming..
>> >> > >> >> >>> >
>> >> > >> >> >>> >
>> >> > >> >> >>> > Zhenshi Zhou <[email protected]> 于2019年3月13日周三
>> 下午2:58写道：
>> >> > >> >> >>> >>
>> >> > >> >> >>> >> Hi,
>> >> > >> >> >>> >>
>> >> > >> >> >>> >> I didn't set  osd_beacon_report_interval as it must be
>> the default value.
>> >> > >> >> >>> >> I have set osd_beacon_report_interval to 60 and
>> debug_mon to 10.
>> >> > >> >> >>> >>
>> >> > >> >> >>> >> Attachment is the leader monitor log, the "mark-down"
>> operations is at 14:22
>> >> > >> >> >>> >>
>> >> > >> >> >>> >> Thanks
>> >> > >> >> >>> >>
>> >> > >> >> >>> >> huang jun <[email protected]> 于2019年3月13日周三 下午2:07写道：
>> >> > >> >> >>> >>>
>> >> > >> >> >>> >>> can you get the value of osd_beacon_report_interval
>> item? the default
>> >> > >> >> >>> >>> is 300, you can set to 60,  or maybe turn on
>> debug_ms=1 debug_mon=10
>> >> > >> >> >>> >>> can get more infos.
>> >> > >> >> >>> >>>
>> >> > >> >> >>> >>>
>> >> > >> >> >>> >>> Zhenshi Zhou <[email protected]> 于2019年3月13日周三
>> 下午1:20写道：
>> >> > >> >> >>> >>> >
>> >> > >> >> >>> >>> > Hi,
>> >> > >> >> >>> >>> >
>> >> > >> >> >>> >>> > The servers are cennected to the same switch.
>> >> > >> >> >>> >>> > I can ping from anyone of the servers to other
>> servers
>> >> > >> >> >>> >>> > without a packet lost and the average round trip time
>> >> > >> >> >>> >>> > is under 0.1 ms.
>> >> > >> >> >>> >>> >
>> >> > >> >> >>> >>> > Thanks
>> >> > >> >> >>> >>> >
>> >> > >> >> >>> >>> > Ashley Merrick <[email protected]>
>> 于2019年3月13日周三 下午12:06写道：
>> >> > >> >> >>> >>> >>
>> >> > >> >> >>> >>> >> Can you ping all your OSD servers from all your
>> mons, and ping your mons from all your OSD servers?
>> >> > >> >> >>> >>> >>
>> >> > >> >> >>> >>> >> I’ve seen this where a route wasn’t working one
>> direction, so it made OSDs flap when it used that mon to check availability:
>> >> > >> >> >>> >>> >>
>> >> > >> >> >>> >>> >> On Wed, 13 Mar 2019 at 11:50 AM, Zhenshi Zhou <
>> [email protected]> wrote:
>> >> > >> >> >>> >>> >>>
>> >> > >> >> >>> >>> >>> After checking the network and syslog/dmsg, I
>> think it's not the network or hardware issue. Now there're some
>> >> > >> >> >>> >>> >>> osds being marked down every 15 minutes.
>> >> > >> >> >>> >>> >>>
>> >> > >> >> >>> >>> >>> here is ceph.log:
>> >> > >> >> >>> >>> >>> 2019-03-13 11:06:26.290701 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6756 : cluster [INF] Cluster is now healthy
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.705787 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6757 : cluster [INF] osd.1 marked down after no beacon
>> for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.705858 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6758 : cluster [INF] osd.2 marked down after no beacon
>> for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.705920 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6759 : cluster [INF] osd.4 marked down after no beacon
>> for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.705957 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6760 : cluster [INF] osd.6 marked down after no beacon
>> for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.705999 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6761 : cluster [INF] osd.7 marked down after no beacon
>> for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706040 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6762 : cluster [INF] osd.10 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706079 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6763 : cluster [INF] osd.11 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706118 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6764 : cluster [INF] osd.12 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706155 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6765 : cluster [INF] osd.13 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706195 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6766 : cluster [INF] osd.14 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706233 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6767 : cluster [INF] osd.15 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706273 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6768 : cluster [INF] osd.16 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706312 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6769 : cluster [INF] osd.17 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706351 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6770 : cluster [INF] osd.18 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706385 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6771 : cluster [INF] osd.19 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706423 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6772 : cluster [INF] osd.20 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706503 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6773 : cluster [INF] osd.22 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706549 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6774 : cluster [INF] osd.23 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706587 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6775 : cluster [INF] osd.25 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706625 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6776 : cluster [INF] osd.26 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706665 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6777 : cluster [INF] osd.27 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706703 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6778 : cluster [INF] osd.28 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706741 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6779 : cluster [INF] osd.30 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706779 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6780 : cluster [INF] osd.31 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706817 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6781 : cluster [INF] osd.33 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706856 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6782 : cluster [INF] osd.34 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706894 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6783 : cluster [INF] osd.36 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706930 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6784 : cluster [INF] osd.38 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.706974 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6785 : cluster [INF] osd.40 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707013 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6786 : cluster [INF] osd.41 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707051 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6787 : cluster [INF] osd.42 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707090 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6788 : cluster [INF] osd.44 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707128 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6789 : cluster [INF] osd.45 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707166 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6790 : cluster [INF] osd.46 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707204 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6791 : cluster [INF] osd.48 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707242 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6792 : cluster [INF] osd.49 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707279 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6793 : cluster [INF] osd.50 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707317 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6794 : cluster [INF] osd.51 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707357 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6795 : cluster [INF] osd.53 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707396 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6796 : cluster [INF] osd.54 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707435 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6797 : cluster [INF] osd.56 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707488 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6798 : cluster [INF] osd.59 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.707533 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6799 : cluster [INF] osd.61 marked down after no
>> beacon for 900.067020 seconds
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.711989 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6800 : cluster [WRN] Health check failed: 43 osds down
>> (OSD_DOWN)
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.843542 osd.15 osd.15
>> 10.39.0.35:6808/541558 157 : cluster [WRN] Monitor daemon marked osd.15
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.711989 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6800 : cluster [WRN] Health check failed: 43 osds down
>> (OSD_DOWN)
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.843542 osd.15 osd.15
>> 10.39.0.35:6808/541558 157 : cluster [WRN] Monitor daemon marked osd.15
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.723955 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6802 : cluster [INF] osd.15 10.39.0.35:6808/541558 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724094 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6803 : cluster [INF] osd.51 10.39.0.39:6802/561995 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724177 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6804 : cluster [INF] osd.45 10.39.0.39:6800/561324 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724220 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6805 : cluster [INF] osd.1 10.39.0.34:6801/546469 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724260 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6806 : cluster [INF] osd.17 10.39.0.35:6806/541774 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724300 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6807 : cluster [INF] osd.50 10.39.0.39:6828/561887 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724348 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6808 : cluster [INF] osd.25 10.39.0.36:6804/548005 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724392 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6809 : cluster [INF] osd.13 10.39.0.35:6800/541337 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724438 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6810 : cluster [INF] osd.59 10.39.0.40:6807/570951 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724511 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6811 : cluster [INF] osd.53 10.39.0.39:6816/562213 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724555 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6812 : cluster [INF] osd.19 10.39.0.36:6802/547356 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724597 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6813 : cluster [INF] osd.26 10.39.0.36:6816/548112 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724647 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6814 : cluster [INF] osd.27 10.39.0.37:6803/547560 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724688 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6815 : cluster [INF] osd.2 10.39.0.34:6808/546587 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724742 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6816 : cluster [INF] osd.7 10.39.0.34:6802/547173 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724787 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6817 : cluster [INF] osd.40 10.39.0.38:6805/552745 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724839 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6818 : cluster [INF] osd.36 10.39.0.38:6814/552289 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724890 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6819 : cluster [INF] osd.54 10.39.0.40:6802/570399 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724941 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6820 : cluster [INF] osd.46 10.39.0.39:6807/561444 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.724989 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6821 : cluster [INF] osd.28 10.39.0.37:6808/547680 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725075 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6822 : cluster [INF] osd.41 10.39.0.38:6802/552890 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725121 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6823 : cluster [INF] osd.20 10.39.0.36:6812/547465 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725160 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6824 : cluster [INF] osd.42 10.39.0.38:6832/553002 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725203 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6825 : cluster [INF] osd.61 10.39.0.40:6801/571166 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725257 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6826 : cluster [INF] osd.18 10.39.0.36:6805/547240 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725299 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6827 : cluster [INF] osd.22 10.39.0.36:6800/547682 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725362 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6828 : cluster [INF] osd.49 10.39.0.39:6805/561776 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725412 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6829 : cluster [INF] osd.12 10.39.0.35:6802/541229 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725484 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6830 : cluster [INF] osd.34 10.39.0.37:6802/548338 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725560 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6831 : cluster [INF] osd.23 10.39.0.36:6810/547790 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725609 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6832 : cluster [INF] osd.16 10.39.0.35:6801/541666 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725662 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6833 : cluster [INF] osd.56 10.39.0.40:6829/570623 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725753 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6834 : cluster [INF] osd.48 10.39.0.39:6806/561666 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725818 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6835 : cluster [INF] osd.33 10.39.0.37:6810/548230 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725868 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6836 : cluster [INF] osd.30 10.39.0.37:6829/547901 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725976 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6837 : cluster [INF] osd.6 10.39.0.34:6805/547049 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.726022 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6838 : cluster [INF] osd.4 10.39.0.34:6822/546816 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.849960 osd.10 osd.10
>> 10.39.0.35:6824/540996 155 : cluster [WRN] Monitor daemon marked osd.10
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.853594 osd.1 osd.1
>> 10.39.0.34:6801/546469 165 : cluster [WRN] Monitor daemon marked osd.1
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.862261 osd.53 osd.53
>> 10.39.0.39:6816/562213 153 : cluster [WRN] Monitor daemon marked osd.53
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.863580 osd.50 osd.50
>> 10.39.0.39:6828/561887 155 : cluster [WRN] Monitor daemon marked osd.50
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.765514 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6840 : cluster [WRN] Health check failed: Reduced data
>> availability: 38 pgs inactive, 78 pgs peering (PG_AVAILABILITY)
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.765574 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6841 : cluster [WRN] Health check failed: too few PGs
>> per OSD (28 < min 30) (TOO_FEW_PGS)
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:23.726065 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6842 : cluster [INF] Health check cleared: OSD_DOWN
>> (was: 6 osds down)
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:23.729961 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6843 : cluster [INF] osd.11 10.39.0.35:6805/541106 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:23.731669 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6844 : cluster [INF] osd.10 10.39.0.35:6824/540996 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:23.731789 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6845 : cluster [INF] osd.14 10.39.0.35:6804/541448 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:23.731859 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6846 : cluster [INF] osd.44 10.39.0.38:6800/553222 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:23.731926 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6847 : cluster [INF] osd.31 10.39.0.37:6807/548009 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:23.731975 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6848 : cluster [INF] osd.38 10.39.0.38:6801/552512 boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.842413 osd.12 osd.12
>> 10.39.0.35:6802/541229 157 : cluster [WRN] Monitor daemon marked osd.12
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.851780 osd.7 osd.7
>> 10.39.0.34:6802/547173 157 : cluster [WRN] Monitor daemon marked osd.7
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.852501 osd.4 osd.4
>> 10.39.0.34:6822/546816 163 : cluster [WRN] Monitor daemon marked osd.4
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.853408 osd.19 osd.19
>> 10.39.0.36:6802/547356 155 : cluster [WRN] Monitor daemon marked osd.19
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.860298 osd.25 osd.25
>> 10.39.0.36:6804/548005 155 : cluster [WRN] Monitor daemon marked osd.25
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.860945 osd.44 osd.44
>> 10.39.0.38:6800/553222 155 : cluster [WRN] Monitor daemon marked osd.44
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.844427 osd.11 osd.11
>> 10.39.0.35:6805/541106 159 : cluster [WRN] Monitor daemon marked osd.11
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.848239 osd.56 osd.56
>> 10.39.0.40:6829/570623 161 : cluster [WRN] Monitor daemon marked osd.56
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.854657 osd.59 osd.59
>> 10.39.0.40:6807/570951 151 : cluster [WRN] Monitor daemon marked osd.59
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:24.772150 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6851 : cluster [WRN] Health check failed: Degraded
>> data redundancy: 420/332661 objects degraded (0.126%), 1 pg degraded
>> (PG_DEGRADED)
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.840603 osd.30 osd.30
>> 10.39.0.37:6829/547901 169 : cluster [WRN] Monitor daemon marked osd.30
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.850658 osd.16 osd.16
>> 10.39.0.35:6801/541666 161 : cluster [WRN] Monitor daemon marked osd.16
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.852257 osd.40 osd.40
>> 10.39.0.38:6805/552745 157 : cluster [WRN] Monitor daemon marked osd.40
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.854963 osd.48 osd.48
>> 10.39.0.39:6806/561666 145 : cluster [WRN] Monitor daemon marked osd.48
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.861335 osd.42 osd.42
>> 10.39.0.38:6832/553002 151 : cluster [WRN] Monitor daemon marked osd.42
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.861593 osd.22 osd.22
>> 10.39.0.36:6800/547682 159 : cluster [WRN] Monitor daemon marked osd.22
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.845502 osd.54 osd.54
>> 10.39.0.40:6802/570399 147 : cluster [WRN] Monitor daemon marked osd.54
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.848701 osd.14 osd.14
>> 10.39.0.35:6804/541448 159 : cluster [WRN] Monitor daemon marked osd.14
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.854024 osd.51 osd.51
>> 10.39.0.39:6802/561995 157 : cluster [WRN] Monitor daemon marked osd.51
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.858817 osd.26 osd.26
>> 10.39.0.36:6816/548112 165 : cluster [WRN] Monitor daemon marked osd.26
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.859382 osd.6 osd.6
>> 10.39.0.34:6805/547049 161 : cluster [WRN] Monitor daemon marked osd.6
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.862837 osd.45 osd.45
>> 10.39.0.39:6800/561324 145 : cluster [WRN] Monitor daemon marked osd.45
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.871469 osd.13 osd.13
>> 10.39.0.35:6800/541337 157 : cluster [WRN] Monitor daemon marked osd.13
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.841995 osd.33 osd.33
>> 10.39.0.37:6810/548230 165 : cluster [WRN] Monitor daemon marked osd.33
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.848627 osd.28 osd.28
>> 10.39.0.37:6808/547680 157 : cluster [WRN] Monitor daemon marked osd.28
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.860501 osd.20 osd.20
>> 10.39.0.36:6812/547465 159 : cluster [WRN] Monitor daemon marked osd.20
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.050876 osd.23 osd.23
>> 10.39.0.36:6810/547790 161 : cluster [WRN] Monitor daemon marked osd.23
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.853694 osd.38 osd.38
>> 10.39.0.38:6801/552512 153 : cluster [WRN] Monitor daemon marked osd.38
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.863745 osd.49 osd.49
>> 10.39.0.39:6805/561776 153 : cluster [WRN] Monitor daemon marked osd.49
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:28.784280 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6852 : cluster [WRN] Health check update: Reduced data
>> availability: 38 pgs peering (PG_AVAILABILITY)
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:28.784332 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6853 : cluster [INF] Health check cleared: PG_DEGRADED
>> (was: Degraded data redundancy: 420/332661 objects degraded (0.126%), 1 pg
>> degraded)
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:28.784372 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6854 : cluster [INF] Health check cleared: TOO_FEW_PGS
>> (was: too few PGs per OSD (28 < min 30))
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.839907 osd.27 osd.27
>> 10.39.0.37:6803/547560 151 : cluster [WRN] Monitor daemon marked osd.27
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.840611 osd.31 osd.31
>> 10.39.0.37:6807/548009 151 : cluster [WRN] Monitor daemon marked osd.31
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.842664 osd.17 osd.17
>> 10.39.0.35:6806/541774 157 : cluster [WRN] Monitor daemon marked osd.17
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.846408 osd.61 osd.61
>> 10.39.0.40:6801/571166 159 : cluster [WRN] Monitor daemon marked osd.61
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.859087 osd.2 osd.2
>> 10.39.0.34:6808/546587 157 : cluster [WRN] Monitor daemon marked osd.2
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.861856 osd.46 osd.46
>> 10.39.0.39:6807/561444 155 : cluster [WRN] Monitor daemon marked osd.46
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.843770 osd.34 osd.34
>> 10.39.0.37:6802/548338 151 : cluster [WRN] Monitor daemon marked osd.34
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.845353 osd.18 osd.18
>> 10.39.0.36:6805/547240 165 : cluster [WRN] Monitor daemon marked osd.18
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.860607 osd.41 osd.41
>> 10.39.0.38:6802/552890 157 : cluster [WRN] Monitor daemon marked osd.41
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.861365 osd.36 osd.36
>> 10.39.0.38:6814/552289 159 : cluster [WRN] Monitor daemon marked osd.36
>> down, but it is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:30.790492 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6855 : cluster [INF] Health check cleared:
>> PG_AVAILABILITY (was: Reduced data availability: 38 pgs peering)
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:30.790549 mon.ceph-mon1 mon.0
>> 10.39.0.34:6789/0 6856 : cluster [INF] Cluster is now healthy
>> >> > >> >> >>> >>> >>>
>> >> > >> >> >>> >>> >>>
>> >> > >> >> >>> >>> >>> and the ceph-osd.2.log:
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.857 7fd39aefc700  0
>> log_channel(cluster) log [WRN] : Monitor daemon marked osd.2 down, but it
>> is still running
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.857 7fd39aefc700  0
>> log_channel(cluster) log [DBG] : map e13907 wrongly marked me down at e13907
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.857 7fd39aefc700  1 osd.2
>> 13907 start_waiting_for_healthy
>> >> > >> >> >>> >>> >>> .....
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.865 7fd39aefc700  1 osd.2
>> 13907 is_healthy false -- only 0/10 up peers (less than 33%)
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:21.865 7fd39aefc700  1 osd.2
>> 13907 not healthy; waiting to boot
>> >> > >> >> >>> >>> >>> .....
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.293 7fd3a8f18700  1 osd.2
>> 13907 start_boot
>> >> > >> >> >>> >>> >>> 2019-03-13 11:21:22.725 7fd39aefc700  1 osd.2
>> 13908 state: booting -> active
>> >> > >> >> >>> >>> >>>
>> >> > >> >> >>> >>> >>>
>> >> > >> >> >>> >>> >>>
>> >> > >> >> >>> >>> >>>
>> >> > >> >> >>> >>> >>> Zhenshi Zhou <[email protected]> 于2019年3月12日周二
>> 下午6:08写道：
>> >> > >> >> >>> >>> >>>>
>> >> > >> >> >>> >>> >>>> Hi Kevin,
>> >> > >> >> >>> >>> >>>>
>> >> > >> >> >>> >>> >>>> I'm sure the firewalld are disabled on each host.
>> >> > >> >> >>> >>> >>>>
>> >> > >> >> >>> >>> >>>> Well, the network is not a problem. The servers
>> are connected
>> >> > >> >> >>> >>> >>>> to the same switch and the connection is good
>> when the osds
>> >> > >> >> >>> >>> >>>> are marked as down. There was no interruption or
>> delay.
>> >> > >> >> >>> >>> >>>>
>> >> > >> >> >>> >>> >>>> I restart the leader monitor daemon and it seems
>> return to the
>> >> > >> >> >>> >>> >>>> normal state.
>> >> > >> >> >>> >>> >>>>
>> >> > >> >> >>> >>> >>>> Thanks.
>> >> > >> >> >>> >>> >>>>
>> >> > >> >> >>> >>> >>>> Kevin Olbrich <[email protected]> 于2019年3月12日周二
>> 下午5:44写道：
>> >> > >> >> >>> >>> >>>>>
>> >> > >> >> >>> >>> >>>>> Are you sure that firewalld is stopped and
>> disabled?
>> >> > >> >> >>> >>> >>>>> Looks exactly like that when I missed one host
>> in a test cluster.
>> >> > >> >> >>> >>> >>>>>
>> >> > >> >> >>> >>> >>>>> Kevin
>> >> > >> >> >>> >>> >>>>>
>> >> > >> >> >>> >>> >>>>>
>> >> > >> >> >>> >>> >>>>> Am Di., 12. März 2019 um 09:31 Uhr schrieb
>> Zhenshi Zhou <[email protected]>:
>> >> > >> >> >>> >>> >>>>>>
>> >> > >> >> >>> >>> >>>>>> Hi,
>> >> > >> >> >>> >>> >>>>>>
>> >> > >> >> >>> >>> >>>>>> I deployed a ceph cluster with good
>> performance. But the logs
>> >> > >> >> >>> >>> >>>>>> indicate that the cluster is not as stable as I
>> think it should be.
>> >> > >> >> >>> >>> >>>>>>
>> >> > >> >> >>> >>> >>>>>> The log shows the monitors mark some osd as
>> down periodly:
>> >> > >> >> >>> >>> >>>>>>
>> >> > >> >> >>> >>> >>>>>>
>> >> > >> >> >>> >>> >>>>>> I didn't find any useful information in osd
>> logs.
>> >> > >> >> >>> >>> >>>>>>
>> >> > >> >> >>> >>> >>>>>> ceph version 13.2.4 mimic (stable)
>> >> > >> >> >>> >>> >>>>>> OS version CentOS 7.6.1810
>> >> > >> >> >>> >>> >>>>>> kernel version 5.0.0-2.el7
>> >> > >> >> >>> >>> >>>>>>
>> >> > >> >> >>> >>> >>>>>> Thanks.
>> >> > >> >> >>> >>> >>>>>> _______________________________________________
>> >> > >> >> >>> >>> >>>>>> ceph-users mailing list
>> >> > >> >> >>> >>> >>>>>> [email protected]
>> >> > >> >> >>> >>> >>>>>>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > >> >> >>> >>> >>>
>> >> > >> >> >>> >>> >>> _______________________________________________
>> >> > >> >> >>> >>> >>> ceph-users mailing list
>> >> > >> >> >>> >>> >>> [email protected]
>> >> > >> >> >>> >>> >>>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > >> >> >>> >>> >
>> >> > >> >> >>> >>> > _______________________________________________
>> >> > >> >> >>> >>> > ceph-users mailing list
>> >> > >> >> >>> >>> > [email protected]
>> >> > >> >> >>> >>> >
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> > >> >> >>> >>>
>> >> > >> >> >>> >>>
>> >> > >> >> >>> >>>
>> >> > >> >> >>> >>> --
>> >> > >> >> >>> >>> Thank you!
>> >> > >> >> >>> >>> HuangJun
>> >> > >> >> >>>
>> >> > >> >> >>>
>> >> > >> >> >>>
>> >> > >> >> >>> --
>> >> > >> >> >>> Thank you!
>> >> > >> >> >>> HuangJun
>> >> > >> >>
>> >> > >> >>
>> >> > >> >>
>> >> > >> >> --
>> >> > >> >> Thank you!
>> >> > >> >> HuangJun
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> --
>> >> > >> Thank you!
>> >> > >> HuangJun
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Thank you!
>> >> > HuangJun
>> >>
>> >>
>> >>
>> >> --
>> >> Thank you!
>> >> HuangJun
>>
>>
>>
>> --
>> Thank you!
>> HuangJun
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cluster is not stable

Reply via email to