hubcio opened a new issue, #2386:
URL: https://github.com/apache/iggy/issues/2386

   In Iggy's shared-nothing architecture, each shard owns specific partitions 
based on consistent hashing. When a client
   connected to Shard A tries to access a partition owned by Shard B, Shard A 
creates message to Shard B, sends it, waits
   for reply and sends response back to client. This makes latency worse.
   
   The aim of this task is to implement a TCP connection transfer mechanism, so 
that it could to be transferred to the
   correct shard. This way we would eliminate additional hop.
   
   The idea is to use file descriptor duplication (libc::dup()) to migrate TCP 
connections between shards without
   client disruption.
   
   Basic flow for SendMessages:
   
   1. Detection: Client sends `SendMessages` to wrong shard (`IggyNamespace` 
doesn't match)
   2. FD Duplication: Source shard duplicates the TCP socket file descriptor
   3. Inter-Shard Message: Send duplicate FD + session metadata to target shard
   4. Socket Reconstruction: Target shard converts FD back to `TcpStream`
   5. Hand-off:
     - Target shard spawns new connection handler
     - Source shard exits without closing socket
     - Client continues unaware of transfer
   
   My general thoughts:
   - Introduce a new struct:
   ```rust
     SocketTransfer {
         fd: RawFd,
         from_shard: u16,
         client_id: u32,
         user_id: u32,
         ip_address: SocketAddr,
         initial_data: Vec<u8>
     }
   ```
   - Add `SocketTransferred` error variant
   - Extend `ShardRequestPayload` with `SocketTransfer` variant
   - Implement socket transfer handler on receiving shard
   - Modify `SendMessages` handler to detect and initiate transfers
   - Update connection lifecycle to handle transfer signals
   - Add session state preservation (user ID, client metadata)
   - Implement proper FD cleanup on failure paths
   
   Preferably implement `SendMessges` first and create PR for that, later do 
the same for `PollMessages`.
   
   Estimate the impact of initial connection transfer: implement a test in 
which client would constantly poll from two different partitions and run it on 
changed code and on master, compare results).
   
   Remark: this is not easy task


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to