Re: [ceph-users] failed lossy con, dropping message

Gregory Farnum Thu, 13 Apr 2017 06:23:13 -0700

On Thu, Apr 13, 2017 at 2:17 AM Laszlo Budai <[email protected]>
wrote:


> Hello Greg,
>
> Thank you for the answer.
> I'm still in doubt with the "lossy". What does it mean in this context? I
> can think of different variants:
> 1. The designer of the protocol from start is considering the connection
> to be "lossy" so the connection errors are handled in a higher layer. So
> the layer that has observed the failure of the connection is just logging
> this event and will let the upper layer to handle it. This would support
> your statement 'since it's a "lossy" connection we don't need to remember
> the message and resend it.'


This one. :)
The messenger subsystem can be configured as lossy or non-lossy; all the
RADOS connecrions are lossy since a failure frequently means we'll have for
etargwt the operation anyway (to a different OSD). CephFS uses the state
full connections a bit more.
-Greg



>
> 2. A connection is not declared "lossy" as long as it is working properly.
> Once it ha lost some packets or some error threshold is reached, we declare
> the connection as being lossy, inform the higher layer, and let it decide
> what next. Compared with point 1. the actions are quite similar, but the
> usage of the "lossy" is different. At point 1. a connection is always
> "lossy" even if it is not losing any packet actually. In the second case
> the connection will became "lossy" when the errors will appear, so "lossy"
> is a runtime state of the connection.
>
> Maybe both are wrong and the truth is a third variant ... :) This is what
> I would like to understand.
>
> Kind regards,
> Laszlo
>
>
> On 13.04.2017 00:36, Gregory Farnum wrote:
> > On Wed, Apr 12, 2017 at 3:00 AM, Laszlo Budai <[email protected]>
> wrote:
> >> Hello,
> >>
> >> yesterday one of our compute nodes has recorded the following message
> for
> >> one of the ceph connections:
> >>
> >> submit_message osd_op(client.28817736.0:690186
> >> rbd_data.15c046b11ab57b7.00000000000000c4 [read 2097152~380928]
> 3.6f81364a
> >> ack+read+known_if_redirected e3617) v5 remote, 10.12.68.71:6818/6623,
> failed
> >> lossy con, dropping message
> >
> > A read message, sent to the OSD at IP 10.12.68.71:6818/6623, is being
> > dropped because the connection has somehow failed; since it's a
> > "lossy" connection we don't need to remember the message and resend
> > it. That failure could be an actual TCP/IP stack error; it could be
> > because a different thread killed the connection and it's now closed.
> >
> > If you've just got one of these and didn't see other problems, it's
> > innocuous — I expect the most common cause for this is an OSD getting
> > marked down while IO is pending to it. :)
> > -Greg
> >
> >>
> >> Can someone "decode" the above message, or direct me to some document
> where
> >> I could read more about it?
> >>
> >> We have ceph 0.94.10.
> >>
> >> Thank you,
> >> Laszlo
> >> _______________________________________________
> >> ceph-users mailing list
> >> [email protected]
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] failed lossy con, dropping message

Reply via email to