On Thu, Apr 13, 2017 at 2:17 AM Laszlo Budai <[email protected]> wrote:
> Hello Greg, > > Thank you for the answer. > I'm still in doubt with the "lossy". What does it mean in this context? I > can think of different variants: > 1. The designer of the protocol from start is considering the connection > to be "lossy" so the connection errors are handled in a higher layer. So > the layer that has observed the failure of the connection is just logging > this event and will let the upper layer to handle it. This would support > your statement 'since it's a "lossy" connection we don't need to remember > the message and resend it.' This one. :) The messenger subsystem can be configured as lossy or non-lossy; all the RADOS connecrions are lossy since a failure frequently means we'll have for etargwt the operation anyway (to a different OSD). CephFS uses the state full connections a bit more. -Greg > > 2. A connection is not declared "lossy" as long as it is working properly. > Once it ha lost some packets or some error threshold is reached, we declare > the connection as being lossy, inform the higher layer, and let it decide > what next. Compared with point 1. the actions are quite similar, but the > usage of the "lossy" is different. At point 1. a connection is always > "lossy" even if it is not losing any packet actually. In the second case > the connection will became "lossy" when the errors will appear, so "lossy" > is a runtime state of the connection. > > Maybe both are wrong and the truth is a third variant ... :) This is what > I would like to understand. > > Kind regards, > Laszlo > > > On 13.04.2017 00:36, Gregory Farnum wrote: > > On Wed, Apr 12, 2017 at 3:00 AM, Laszlo Budai <[email protected]> > wrote: > >> Hello, > >> > >> yesterday one of our compute nodes has recorded the following message > for > >> one of the ceph connections: > >> > >> submit_message osd_op(client.28817736.0:690186 > >> rbd_data.15c046b11ab57b7.00000000000000c4 [read 2097152~380928] > 3.6f81364a > >> ack+read+known_if_redirected e3617) v5 remote, 10.12.68.71:6818/6623, > failed > >> lossy con, dropping message > > > > A read message, sent to the OSD at IP 10.12.68.71:6818/6623, is being > > dropped because the connection has somehow failed; since it's a > > "lossy" connection we don't need to remember the message and resend > > it. That failure could be an actual TCP/IP stack error; it could be > > because a different thread killed the connection and it's now closed. > > > > If you've just got one of these and didn't see other problems, it's > > innocuous — I expect the most common cause for this is an OSD getting > > marked down while IO is pending to it. :) > > -Greg > > > >> > >> Can someone "decode" the above message, or direct me to some document > where > >> I could read more about it? > >> > >> We have ceph 0.94.10. > >> > >> Thank you, > >> Laszlo > >> _______________________________________________ > >> ceph-users mailing list > >> [email protected] > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
