hubcio opened a new issue, #2386:
URL: https://github.com/apache/iggy/issues/2386
In Iggy's shared-nothing architecture, each shard owns specific partitions
based on consistent hashing. When a client
connected to Shard A tries to access a partition owned by Shard B, Shard A
creates message to Shard B, sends it, waits
for reply and sends response back to client. This makes latency worse.
The aim of this task is to implement a TCP connection transfer mechanism, so
that it could to be transferred to the
correct shard. This way we would eliminate additional hop.
The idea is to use file descriptor duplication (libc::dup()) to migrate TCP
connections between shards without
client disruption.
Basic flow for SendMessages:
1. Detection: Client sends `SendMessages` to wrong shard (`IggyNamespace`
doesn't match)
2. FD Duplication: Source shard duplicates the TCP socket file descriptor
3. Inter-Shard Message: Send duplicate FD + session metadata to target shard
4. Socket Reconstruction: Target shard converts FD back to `TcpStream`
5. Hand-off:
- Target shard spawns new connection handler
- Source shard exits without closing socket
- Client continues unaware of transfer
My general thoughts:
- Introduce a new struct:
```rust
SocketTransfer {
fd: RawFd,
from_shard: u16,
client_id: u32,
user_id: u32,
ip_address: SocketAddr,
initial_data: Vec<u8>
}
```
- Add `SocketTransferred` error variant
- Extend `ShardRequestPayload` with `SocketTransfer` variant
- Implement socket transfer handler on receiving shard
- Modify `SendMessages` handler to detect and initiate transfers
- Update connection lifecycle to handle transfer signals
- Add session state preservation (user ID, client metadata)
- Implement proper FD cleanup on failure paths
Preferably implement `SendMessges` first and create PR for that, later do
the same for `PollMessages`.
Estimate the impact of initial connection transfer: implement a test in
which client would constantly poll from two different partitions and run it on
changed code and on master, compare results).
Remark: this is not easy task
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]