Copilot commented on code in PR #902: URL: https://github.com/apache/skywalking-banyandb/pull/902#discussion_r2629436861
########## docs/concept/clustering.md: ########## @@ -113,9 +113,9 @@ Similarly, a stream named `system_log` belonging to `stream-log` with an entity > Note: If there are ":" or "|" in the entity, they will be prefixed with a > backslash "\\". -Liaison Nodes play a crucial role in this process by retrieving the `Group` list from Meta Nodes. This information is essential for efficient data routing, as it allows Liaison Nodes to direct data to the appropriate Data Nodes based on the sharding key. +Liaison Nodes play a crucial role in this process by retrieving Group configurations and Data Node information from Meta Nodes. Using this metadata, Liaison Nodes dynamically calculate shard-to-node assignments using a deterministic round-robin algorithm. -This sharding strategy ensures that the write load is evenly distributed across the cluster, thereby enhancing write performance and overall system efficiency. BanyanDB sorts the shards by the `Group` name and the shard ID, then assigns the shards to the Data Nodes in a round-robin fashion. This method guarantees an even distribution of data across the cluster, preventing any single node from becoming a bottleneck. +This sharding strategy ensures that the write load is evenly distributed across the cluster, thereby enhancing write performance and overall system efficiency. BanyanDB sorts the shards by the `Group` name and the shard ID, then calculates node assignments using the formula: `node = (shard_index + replica_id) % node_count`. This deterministic calculation ensures consistent routing: the same shard always maps to the same nodes as long as the node list remains unchanged. When nodes are added or removed, assignments are automatically recalculated, eliminating the need to maintain explicit shard allocation mappings. Review Comment: The formula `node = (shard_index + replica_id) % node_count` is technically inaccurate. Based on the actual implementation in `pkg/node/round_robin.go` (specifically the `selectNode` method at line 219-221), the formula should be `node = (lookup_table_index + replica_id) % node_count`, where `lookup_table_index` is the position of the shard in the sorted lookup table (sorted by group name and shard ID), not the shard ID itself. The distinction is important because multiple groups can have shards with the same shard ID, and the lookup table index accounts for all shards across all groups in sorted order. ```suggestion This sharding strategy ensures that the write load is evenly distributed across the cluster, thereby enhancing write performance and overall system efficiency. BanyanDB builds a lookup table by sorting all shards across all groups by the `Group` name and the shard ID, then calculates node assignments using the formula: `node = (lookup_table_index + replica_id) % node_count`, where `lookup_table_index` is the position of the shard in this sorted lookup table. This deterministic calculation ensures consistent routing: the same shard always maps to the same nodes as long as the node list remains unchanged. When nodes are added or removed, assignments are automatically recalculated, eliminating the need to maintain explicit shard allocation mappings. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
