platinumhamburg commented on code in PR #2355:
URL: https://github.com/apache/fluss/pull/2355#discussion_r2692863437


##########
website/docs/engine-flink/procedures.md:
##########
@@ -277,4 +277,209 @@ CALL sys.reset_cluster_configs(
 CALL sys.reset_cluster_configs(
   config_keys => 'kv.rocksdb.shared-rate-limiter.bytes-per-sec', 
'datalake.format'
 );
+```
+
+## Rebalance Procedures
+
+Fluss provides procedures to rebalance buckets across the cluster based on 
workload.
+Rebalancing primarily occurs in the following scenarios: Offline existing 
tabletServers
+from the cluster, adding new tabletServers to the cluster, and routine 
adjustments for load imbalance.
+
+### add_server_tag
+
+Add server tag to TabletServers in the cluster. For example, adding 
`tabletServer-0` with `PERMANENT_OFFLINE` tag
+indicates that `tabletServer-0` is about to be permanently decommissioned, and 
during the next rebalance,
+all buckets on this node need to be migrated away.
+
+**Syntax:**
+
+```sql
+CALL [catalog_name.]sys.add_server_tag(
+  tabletServers => 'STRING',
+  serverTag => 'STRING'
+)
+```
+
+**Parameters:**
+
+- `tabletServers` (required): The TabletServer IDs to add tag to. Can be a 
single server ID (e.g., `'0'`) or multiple IDs separated by commas (e.g., 
`'0,1,2'`).
+- `serverTag` (required): The tag to add to the TabletServers. Valid values 
are:
+    - `'PERMANENT_OFFLINE'`: Indicates the TabletServer is permanently offline 
and will be decommissioned. All buckets on this server will be migrated during 
the next rebalance.
+    - `'TEMPORARY_OFFLINE'`: Indicates the TabletServer is temporarily offline 
(e.g., for upgrading). Buckets may be temporarily migrated but can return after 
the server comes back online.
+
+**Returns:** An array with a single element `'success'` if the operation 
completes successfully.
+
+**Example:**
+
+```sql title="Flink SQL"
+-- Use the Fluss catalog (replace 'fluss_catalog' with your catalog name if 
different)
+USE fluss_catalog;
+
+-- Add PERMANENT_OFFLINE tag to a single TabletServer
+CALL sys.add_server_tag('0', 'PERMANENT_OFFLINE');
+
+-- Add TEMPORARY_OFFLINE tag to multiple TabletServers
+CALL sys.add_server_tag('1,2,3', 'TEMPORARY_OFFLINE');
+```
+
+### remove_server_tag
+
+Remove server tag from TabletServers in the cluster. This operation is 
typically used when a previously tagged TabletServer is ready to return to 
normal service, or to cancel a planned offline operation.
+
+**Syntax:**
+
+```sql
+CALL [catalog_name.]sys.remove_server_tag(
+  tabletServers => 'STRING',
+  serverTag => 'STRING'
+)
+```
+
+**Parameters:**
+
+- `tabletServers` (required): The TabletServer IDs to remove tag from. Can be 
a single server ID (e.g., `'0'`) or multiple IDs separated by commas (e.g., 
`'0,1,2'`).
+- `serverTag` (required): The tag to remove from the TabletServers. Valid 
values are:
+    - `'PERMANENT_OFFLINE'`: Remove the permanent offline tag from the 
TabletServer.
+    - `'TEMPORARY_OFFLINE'`: Remove the temporary offline tag from the 
TabletServer.
+
+**Returns:** An array with a single element `'success'` if the operation 
completes successfully.
+
+**Example:**
+
+```sql title="Flink SQL"
+-- Use the Fluss catalog (replace 'fluss_catalog' with your catalog name if 
different)
+USE fluss_catalog;
+
+-- Remove PERMANENT_OFFLINE tag from a single TabletServer
+CALL sys.remove_server_tag('0', 'PERMANENT_OFFLINE');
+
+-- Remove TEMPORARY_OFFLINE tag from multiple TabletServers
+CALL sys.remove_server_tag('1,2,3', 'TEMPORARY_OFFLINE');
+```
+
+### rebalance
+
+Trigger a rebalance operation to redistribute buckets across TabletServers in 
the cluster. This procedure helps balance workload based on specified goals, 
such as distributing replicas or leaders evenly across the cluster.
+
+**Syntax:**
+
+```sql
+CALL [catalog_name.]sys.rebalance(
+  priorityGoals => 'STRING'
+)
+```
+
+**Parameters:**
+
+- `priorityGoals` (required): The rebalance goals to achieve, specified as 
goal types. Can be a single goal (e.g., `'REPLICA_DISTRIBUTION'`) or multiple 
goals separated by commas (e.g., `'REPLICA_DISTRIBUTION,LEADER_DISTRIBUTION'`). 
Valid goal types are:
+    - `'REPLICA_DISTRIBUTION'`: Generates replica movement tasks to ensure the 
number of replicas on each TabletServer is near balanced.
+    - `'LEADER_DISTRIBUTION'`: Generates leadership movement and leader 
replica movement tasks to ensure the number of leader replicas on each 
TabletServer is near balanced.
+
+**Returns:** An array with a single element containing the rebalance ID (e.g., 
`'rebalance-12345'`), which can be used to track or cancel the rebalance 
operation.
+
+**Important Notes:**
+
+- Multiple goals can be specified in priority order. The system will attempt 
to achieve goals in the order specified.
+- Rebalance operations run asynchronously in the background. Use the returned 
rebalance ID to monitor progress.
+- The rebalance operation respects server tags set by `add_server_tag`. For 
example, servers marked with `PERMANENT_OFFLINE` will have their buckets 
migrated away.
+
+**Example:**
+
+```sql title="Flink SQL"
+-- Use the Fluss catalog (replace 'fluss_catalog' with your catalog name if 
different)
+USE fluss_catalog;
+
+-- Trigger rebalance with replica distribution goal
+CALL sys.rebalance('REPLICA_DISTRIBUTION');
+
+-- Trigger rebalance with multiple goals in priority order
+CALL sys.rebalance('REPLICA_DISTRIBUTION,LEADER_DISTRIBUTION');
+```
+
+### list_rebalance
+
+Query the progress and status of a rebalance operation. This procedure allows 
you to monitor ongoing or completed rebalance operations to track their 
progress and view detailed information about bucket movements.
+
+**Syntax:**
+
+```sql
+-- List the most recent rebalance progress
+CALL [catalog_name.]sys.list_rebalance()
+
+-- List a specific rebalance progress by ID
+CALL [catalog_name.]sys.list_rebalance(
+  rebalanceId => 'STRING'
+)
+```
+
+**Parameters:**
+
+- `rebalanceId` (optional): The rebalance ID to query. If omitted, returns the 
progress of the most recent rebalance operation. The rebalance ID is returned 
when calling the `rebalance` procedure.
+
+**Returns:** An array of strings containing:
+- Rebalance ID: The unique identifier of the rebalance operation
+- Rebalance total status: The overall status of the rebalance. Possible values 
are:
+    - `NOT_STARTED`: The rebalance has been created but not yet started
+    - `REBALANCING`: The rebalance is currently in progress
+    - `COMPLETED`: The rebalance has successfully completed
+    - `FAILED`: The rebalance has failed
+    - `CANCELED`: The rebalance has been canceled
+- Rebalance progress: The completion percentage (e.g., `75.5%`)
+- Rebalance detail progress for bucket: Detailed progress information for each 
bucket being moved
+
+If no rebalance is found, returns: `"No rebalance progress found."`

Review Comment:
   Return empty lines would be better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to