Copilot commented on code in PR #902:
URL: 
https://github.com/apache/skywalking-banyandb/pull/902#discussion_r2629436861


##########
docs/concept/clustering.md:
##########
@@ -113,9 +113,9 @@ Similarly, a stream named `system_log` belonging to 
`stream-log` with an entity
 
 > Note: If there are ":" or "|" in the entity, they will be prefixed with a 
 > backslash "\\".
 
-Liaison Nodes play a crucial role in this process by retrieving the `Group` 
list from Meta Nodes. This information is essential for efficient data routing, 
as it allows Liaison Nodes to direct data to the appropriate Data Nodes based 
on the sharding key.
+Liaison Nodes play a crucial role in this process by retrieving Group 
configurations and Data Node information from Meta Nodes. Using this metadata, 
Liaison Nodes dynamically calculate shard-to-node assignments using a 
deterministic round-robin algorithm.
 
-This sharding strategy ensures that the write load is evenly distributed 
across the cluster, thereby enhancing write performance and overall system 
efficiency. BanyanDB sorts the shards by the `Group` name and the shard ID, 
then assigns the shards to the Data Nodes in a round-robin fashion. This method 
guarantees an even distribution of data across the cluster, preventing any 
single node from becoming a bottleneck.
+This sharding strategy ensures that the write load is evenly distributed 
across the cluster, thereby enhancing write performance and overall system 
efficiency. BanyanDB sorts the shards by the `Group` name and the shard ID, 
then calculates node assignments using the formula: `node = (shard_index + 
replica_id) % node_count`. This deterministic calculation ensures consistent 
routing: the same shard always maps to the same nodes as long as the node list 
remains unchanged. When nodes are added or removed, assignments are 
automatically recalculated, eliminating the need to maintain explicit shard 
allocation mappings.

Review Comment:
   The formula `node = (shard_index + replica_id) % node_count` is technically 
inaccurate. Based on the actual implementation in `pkg/node/round_robin.go` 
(specifically the `selectNode` method at line 219-221), the formula should be 
`node = (lookup_table_index + replica_id) % node_count`, where 
`lookup_table_index` is the position of the shard in the sorted lookup table 
(sorted by group name and shard ID), not the shard ID itself. The distinction 
is important because multiple groups can have shards with the same shard ID, 
and the lookup table index accounts for all shards across all groups in sorted 
order.
   ```suggestion
   This sharding strategy ensures that the write load is evenly distributed 
across the cluster, thereby enhancing write performance and overall system 
efficiency. BanyanDB builds a lookup table by sorting all shards across all 
groups by the `Group` name and the shard ID, then calculates node assignments 
using the formula: `node = (lookup_table_index + replica_id) % node_count`, 
where `lookup_table_index` is the position of the shard in this sorted lookup 
table. This deterministic calculation ensures consistent routing: the same 
shard always maps to the same nodes as long as the node list remains unchanged. 
When nodes are added or removed, assignments are automatically recalculated, 
eliminating the need to maintain explicit shard allocation mappings.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to