Hi, On Fri, Nov 13, 2020 at 5:58 PM Alexander Aring <[email protected]> wrote: > > This patch introduce to make a tcp lowcomms connection reliable even if > reconnects occurs. This is done by an application layer retransmission > handling and sequence numbers in dlm protocols. There are three new dlm > commands: > > DLM_OPTS: > > This will encapsulate an exisiting dlm message (and rcom message if they > don't have an own application side retransmission handling). As optional > handling additional tlv's (type length fields) can be appended. This can > be for example a sequence number field. However because in DLM_OPTS the > lockspace field is unused and a sequence number is a mandatory field it > isn't made as a tlv and we put the sequence number inside the lockspace > id. The possibilty to add optional options are still there for future > purposes. > > DLM_ACK: > > Just a dlm header to ackknowledge the receipe of a DLM_OPTS message to > it's sender. > > DLM_FIN: > > A new DLM message to synchronize pending message to the other dlm end if > the node want to disconnects. Each side waits until it receives this > message and disconnects. It's important that this message has nothing to > do with the application logik because it might run in a timeout if
s/logik/logic/ > ackknowledge messages are dropped. > > To explain the basic functionality take a look into the > dlm_midcomms_receive_buffer() function. This function will take care > that dlm messages are delivered according to their sequence numbers and > request retransmission via sending ackknowledge messages. However there > exists three cases: > > 1. sequence number is the one which is expected. That means everything > is working fine. Additional there is always a check if the next > message was already queued for future, this will occur when there was > some messages drops before. > > 2. A sequence number is in the future, in this case we queue it for might > future delivery, see case 1. > > 3. A sequence number is in the past, in this case we drop this message > because it was already delivered. > > To send ackknowledge we always send the sequence number which is > expected, if the other node sends multiple ackknowledge for the same s/sends/receives/ > sequence numbers it will trigger a retransmission. In case no ackknowledge > is send back, a timer with a timeout handling is running and will trigger > a retranmission as well. Sending multiple acks with the same sequence or > messages with the same sequence should not have any effects that breaks > dlm. Only messages in the far future can break dlm, that's why important > that the closing connection is right synchronized with DLM_FIN which > also resets the sequence numbers. s/ackknowledge/acknowledge/ everywhere and s/retranmission/retransmission/ sorry, I will run aspell on my commit message (I thought checkpatch is doing that). > ... > + > + dlm_receive_buffer(p, nodeid); > + return; > + case DLM_OPTS: > + seq = le32_to_cpu(p->header.u.h_seq); > + > + ret = dlm_opts_check_msglen(p, msglen, nodeid); > + if (ret < 0) > + return; > +#if 0 > + ret = dlm_parse_opts(p->opts.o_opts, p->opts.o_optlen); le16_to_cpu() is missing in optlen. > + if (ret < 0) > + return; > +#endif > + > + p = (union dlm_packet *)((unsigned char *)p->opts.o_opts + > ret); > + mhh, this + ret is a leftover of the commented code above. I commented it in because as so far we don't have any opts, we simply ignore it, we still can make changes to the tlv header e.g. I wasn't sure about one byte or 2 byte fields. 2 bytes is fine or we have a lot of pads which we may never use? We have then a lot of space in types and lengths, but we probably never have a length above 255 for full messages less than 4096, also 255 for types is okay as well, may the 4096 bytes limit can be changed in future... - Alex
