Hello fellow Cephers,

My 12.2.2 cluster is pretty full so I've been adding new nodes/OSDs.
Last week I added two new nodes with 12 OSDs each and they are still
backfilling. I have max_backfills tuned quite low across the board to
minimize client impact. Yesterday I brought two more nodes online each
with 12 OSDs and added them to the crushmap under a staging root,
planning to add those to root=default when the two from last week
complete backfilling. When the OSDs processes came up they all did
what I describe below and since it only takes two OSDs on different
hosts... the mons started marking existing OSDs down. So I backed that
out and am now just working with a single OSD on of the new nodes
until I can figure this out.

When the OSD process starts up it's listening on ports 6800 and 6801
on both the cluster and public interfaces. It successfully gets the
current osdmap from a monitor and chooses 10 OSDs to peer with, all of
which fail.

It doesn't appear to be a basic networking issue; I turned up debug
osd and ms to 20 and based on the following it looks like a successful
ping/reply with the OSD peer (osd.0), but after a while the log says
it's never heard from this OSD.

2019-03-14 14:17:42.350902 7fe698776700 10 osd.403 103451
_add_heartbeat_peer: new peer osd.0 10.8.78.23:6814/8498484
10.8.76.23:6805/8498484
2019-03-14 14:17:44.165460 7fe68df61700  1 -- 10.8.76.48:0/67279 -->
10.8.76.23:6805/8498484 -- osd_ping(ping e103451 stamp 2019-03-14
14:17:44.165415) v4 -- 0x55844222aa00 con 0
2019-03-14 14:17:44.165467 7fe68df61700 20 -- 10.8.76.48:0/67279 >>
10.8.76.23:6805/8498484 conn(0x558442368000 :-1 s=STATE_OPEN pgs=2349
cs=1 l=1).prepare_send_message m osd_ping(ping e103451 stamp
2019-03-14 14:17:44.165415) v4
2019-03-14 14:17:44.165471 7fe68df61700 20 -- 10.8.76.48:0/67279 >>
10.8.76.23:6805/8498484 conn(0x558442368000 :-1 s=STATE_OPEN pgs=2349
cs=1 l=1).prepare_send_message encoding features 2305244844532236283
0x55844222aa00 osd_ping(ping e103451 stamp 2019-03-14 14:17:44.165415)
v4
2019-03-14 14:17:44.165691 7fe6a574e700  5 -- 10.8.76.48:0/67279 >>
10.8.76.23:6805/8498484 conn(0x558442368000 :-1
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2349 cs=1 l=1). rx
osd.0 seq 1 0x55844206ba00 osd_ping(ping_reply e103451 stamp
2019-03-14 14:17:44.165415) v4
2019-03-14 14:17:44.165697 7fe6a574e700  1 -- 10.8.76.48:0/67279 <==
osd.0 10.8.76.23:6805/8498484 1 ==== osd_ping(ping_reply e103451 stamp
2019-03-14 14:17:44.165415) v4 ==== 2004+0+0 (4204681659 0 0)
0x55844206ba00 con 0x558442368000

... seq 2-6...

2019-03-14 14:17:57.468338 7fe68df61700  1 -- 10.8.76.48:0/67279 -->
10.8.76.23:6805/8498484 -- osd_ping(ping e103451 stamp 2019-03-14
14:17:57.468301) v4 -- 0x5584422e2c00 con 0
2019-03-14 14:17:57.468343 7fe68df61700 20 -- 10.8.76.48:0/67279 >>
10.8.76.23:6805/8498484 conn(0x558442368000 :-1 s=STATE_OPEN pgs=2349
cs=1 l=1).prepare_send_message m osd_ping(ping e103451 stamp
2019-03-14 14:17:57.468301) v4
2019-03-14 14:17:57.468348 7fe68df61700 20 -- 10.8.76.48:0/67279 >>
10.8.76.23:6805/8498484 conn(0x558442368000 :-1 s=STATE_OPEN pgs=2349
cs=1 l=1).prepare_send_message encoding features 2305244844532236283
0x5584422e2c00 osd_ping(ping e103451 stamp 2019-03-14 14:17:57.468301)
v4
2019-03-14 14:17:57.468554 7fe6a574e700  5 -- 10.8.76.48:0/67279 >>
10.8.76.23:6805/8498484 conn(0x558442368000 :-1
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2349 cs=1 l=1). rx
osd.0 seq 6 0x55844222a600 osd_ping(ping_reply e103451 stamp
2019-03-14 14:17:57.468301) v4
2019-03-14 14:17:57.468561 7fe6a574e700  1 -- 10.8.76.48:0/67279 <==
osd.0 10.8.76.23:6805/8498484 6 ==== osd_ping(ping_reply e103451 stamp
2019-03-14 14:17:57.468301) v4 ==== 2004+0+0 (306125004 0 0)
0x55844222a600 con 0x558442368000
2019-03-14 14:18:04.266809 7fe6a1f89700 -1 osd.403 103451
heartbeat_check: no reply from 10.8.76.23:6805 osd.0 ever on either
front or back, first ping sent 2019-03-14 14:17:44.165415 (cutoff
2019-03-14 14:17:44.266808)
2019-03-14 14:18:05.267163 7fe6a1f89700 -1 osd.403 103451
heartbeat_check: no reply from 10.8.76.23:6805 osd.0 ever on either
front or back, first ping sent 2019-03-14 14:17:44.165415 (cutoff
2019-03-14 14:17:45.267163)
2019-03-14 14:18:06.267296 7fe6a1f89700 -1 osd.403 103451
heartbeat_check: no reply from 10.8.76.23:6805 osd.0 ever on either
front or back, first ping sent 2019-03-14 14:17:44.165415 (cutoff
2019-03-14 14:17:46.267295)

This whole time other OSDs are marking 403 down so after the grace
period expires the monitor takes it down.
cluster [INF] osd.403 failed (root=staging,host=chhq-supcphsn34) (2
reporters from different host after 25.757343 >= grace 21.560865)

After about 45 seconds the OSD then stops listening on the
aforementioned ports and binds to different ones. Is this normal?
There's no obvious indication in the OSD log of why or when it does
this.
# date;ss -tlnp|grep 67279
Thu Mar 14 14:18:27 CDT 2019
LISTEN     0      128    10.8.78.40:6800                     *:*
            users:(("ceph-osd",pid=67279,fd=14))
LISTEN     0      128    10.8.76.48:6800                     *:*
            users:(("ceph-osd",pid=67279,fd=13))
LISTEN     0      128    10.8.76.48:6801                     *:*
            users:(("ceph-osd",pid=67279,fd=16))
LISTEN     0      128    10.8.78.40:6801                     *:*
            users:(("ceph-osd",pid=67279,fd=15))
# date;ss -tlnp|grep 67279
Thu Mar 14 14:18:28 CDT 2019
LISTEN     0      128    10.8.76.48:6800                     *:*
            users:(("ceph-osd",pid=67279,fd=13))
LISTEN     0      128    10.8.76.48:6802                     *:*
            users:(("ceph-osd",pid=67279,fd=16))
LISTEN     0      128    10.8.78.40:6802                     *:*
            users:(("ceph-osd",pid=67279,fd=14))
LISTEN     0      128    10.8.78.40:6803                     *:*
            users:(("ceph-osd",pid=67279,fd=15))

But the current osdmap still shows those old ports (probably because
all communication from osd to mon has ceased until the next reportable
event?)
103451
osd.403 up   in  weight 1 up_from 103451 up_thru 0 down_at 103437
last_clean_interval [103436,103442) 10.8.76.48:6800/67279
10.8.78.40:6800/67279 10.8.78.40:6801/67279 10.8.76.48:6801/67279
exists,up f664f770-2b92-43c4-818c-db8b482515c4
103456
osd.403 down in  weight 1 up_from 103451 up_thru 0 down_at 103452
last_clean_interval [103436,103442) 10.8.76.48:6800/67279
10.8.78.40:6800/67279 10.8.78.40:6801/67279 10.8.76.48:6801/67279
exists f664f770-2b92-43c4-818c-db8b482515c4

At this point it doesn't appear I can add any OSDs to my cluster. No
existing OSDs have been flapping so I don't think network congestion
or heartbeat traffic in general is an issue. I'm hoping I've
overlooked something simple.

Any help is appreciated!
Thanks,
Josh
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to