chenBright opened a new pull request, #3326:
URL: https://github.com/apache/brpc/pull/3326

   ### What problem does this PR solve?
   
   Issue Number: resolve 
   
   Problem Summary:
   
   The legacy v2 handshake ("RDMA" magic + 36B fixed binary HelloMessage)
   had correctness bugs that made the wire format effectively
   unevolvable.
   
   1. Client never drained the "unknown tail" bytes when a peer sent
      msg_len > HELLO_MSG_LEN_MIN(40). Leftover bytes stayed in the
      socket recv buffer and silently corrupted the next ReadFromFd
      (the ACK).
   2. Server had the symmetric version of A.
   3. Server computed the body read length from its LOCAL
      g_rdma_hello_msg_len, implicitly assuming the peer's hello is the
      same length as its own. A longer peer left bytes behind; a shorter
      peer made the read block. Client used the correct compile-time
      constant; the two sides were not symmetric.
   
   Combined, 1/2/3 meant v2 could not safely append a single byte to the
   hello -- even an "optional hint" appended by a newer sender would
   mis-align an older receiver's next read.
   
   ### What is changed and the side effects?
   
   Changed:
   
   v3 handshake is designed which is wire format, magic-namespace-isolated from 
v2:
   
   ```
     [ "RDM3" 4B ][ pb_size 4B big-endian ][ RdmaHello protobuf bytes ]
   ```
   
   with pb_size in (0, 4096]. RdmaHello carries the same 6 base fields
   as v2 plus room to append future capabilities.
   
   **Why protobuf (and not "v2 plus length prefix")**
   
   - Variable-length fields are coming. Future capability fields will
     include strings (rdma_device_name, netdev_name, ...) and other
     variable-length data. Supporting them on a fixed-binary protocol
     forces us to invent and maintain a TLV layer (per-field type +
     length + value framing, plus version-aware deserialization). That
     is reimplementing protobuf badly. Using protobuf from day one
     costs nothing and is the canonical answer.
   - Fixes v2 bug 1/2/3 generically: pb_size makes the wire
     self-describing, so the receiver never needs to guess the length
     or know the peer's schema version to read the body cleanly.
   - Append-only field evolution out of the box: new optional fields
     cost old receivers nothing -- they're skipped as unknown protobuf
     fields. v2 with a hand-rolled length prefix would still need
     per-field opt-in code on every side.
   - Built-in validation: ParseFromArray fails fast on malformed input;
     required-field presence is enforced at the parse layer, not by
     ad-hoc has_xxx() checks scattered through wire code.
   - bRPC already depends on protobuf -- no new build dependency.
   
   **Why a NEW MAGIC rather than a version field inside protobuf**
   
   - Forces "breaking change" to be a deployment decision visible at
     the wire level. You cannot accidentally ship a backwards-
     incompatible patch via a field-semantics tweak.
   - Server-side dispatch routes by magic to fully independent state
     machines that can't entangle (no `if (version == X)` branches
     anywhere -- this is the abstraction's red line).
   - Any future breaking change bumps the magic ("RDM4", "RDM5", ...).
     v3 fields, once shipped, never change semantics.
   
   **Architecture**
   
   Handshake state machine is split out of RdmaEndpoint:
   
       RdmaHandshake  (abstract base; only 3 virtuals + _ep pointer)
         |-- RdmaHandshakeClientV2 / ServerV2
         |-- RdmaHandshakeClientV3 / ServerV3
         |-- v2_wire::* (file-local helpers shared by v2 leaves)
         `-- v3_wire::* (file-local helpers shared by v3 leaves)
   
   Endpoint-side ProcessHandshakeAt{Client,Server} is protocol-agnostic; 
   subclasses override only SendLocalHello() and ReceiveAndParseRemoteHello(). 
   A ParsedHello struct decouples endpoint-side bring-up (window calculation, 
   BringUpQp) from any wire format.
   
   Side effects:
   - Performance effects:
   
   - Breaking backward compatibility: 
   
   Server-side ALWAYS accepts both v2 and v3 (no gflag, no kill-switch);
   magic routes to fully independent code paths. A single rolling upgrade
   enables v3 fleet-wide.
   
   Client-side picks the wire protocol via gflag with a safe default:
   
       FLAGS_rdma_client_handshake_version  (default 2)
         2 = "RDMA" legacy (zero-regression default)
         3 = "RDM3" protobuf (opt-in once target servers support v3)
   
   Sub-second rollback is one flag flip away. v3 client to v2-only legacy
   server is NOT guaranteed to transparently fall back on the same
   connection -- the supported migration is "upgrade servers first, then
   opt-in clients".
   
   ---
   ### Check List:
   - Please make sure your changes are compilable.
   - When providing us with a new feature, it is best to add related tests.
   - Please follow [Contributor Covenant Code of 
Conduct](https://github.com/apache/brpc/blob/master/CODE_OF_CONDUCT.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to