platinumhamburg commented on code in PR #2355:
URL: https://github.com/apache/fluss/pull/2355#discussion_r2692863437
##########
website/docs/engine-flink/procedures.md:
##########
@@ -277,4 +277,209 @@ CALL sys.reset_cluster_configs(
CALL sys.reset_cluster_configs(
config_keys => 'kv.rocksdb.shared-rate-limiter.bytes-per-sec',
'datalake.format'
);
+```
+
+## Rebalance Procedures
+
+Fluss provides procedures to rebalance buckets across the cluster based on
workload.
+Rebalancing primarily occurs in the following scenarios: Offline existing
tabletServers
+from the cluster, adding new tabletServers to the cluster, and routine
adjustments for load imbalance.
+
+### add_server_tag
+
+Add server tag to TabletServers in the cluster. For example, adding
`tabletServer-0` with `PERMANENT_OFFLINE` tag
+indicates that `tabletServer-0` is about to be permanently decommissioned, and
during the next rebalance,
+all buckets on this node need to be migrated away.
+
+**Syntax:**
+
+```sql
+CALL [catalog_name.]sys.add_server_tag(
+ tabletServers => 'STRING',
+ serverTag => 'STRING'
+)
+```
+
+**Parameters:**
+
+- `tabletServers` (required): The TabletServer IDs to add tag to. Can be a
single server ID (e.g., `'0'`) or multiple IDs separated by commas (e.g.,
`'0,1,2'`).
+- `serverTag` (required): The tag to add to the TabletServers. Valid values
are:
+ - `'PERMANENT_OFFLINE'`: Indicates the TabletServer is permanently offline
and will be decommissioned. All buckets on this server will be migrated during
the next rebalance.
+ - `'TEMPORARY_OFFLINE'`: Indicates the TabletServer is temporarily offline
(e.g., for upgrading). Buckets may be temporarily migrated but can return after
the server comes back online.
+
+**Returns:** An array with a single element `'success'` if the operation
completes successfully.
+
+**Example:**
+
+```sql title="Flink SQL"
+-- Use the Fluss catalog (replace 'fluss_catalog' with your catalog name if
different)
+USE fluss_catalog;
+
+-- Add PERMANENT_OFFLINE tag to a single TabletServer
+CALL sys.add_server_tag('0', 'PERMANENT_OFFLINE');
+
+-- Add TEMPORARY_OFFLINE tag to multiple TabletServers
+CALL sys.add_server_tag('1,2,3', 'TEMPORARY_OFFLINE');
+```
+
+### remove_server_tag
+
+Remove server tag from TabletServers in the cluster. This operation is
typically used when a previously tagged TabletServer is ready to return to
normal service, or to cancel a planned offline operation.
+
+**Syntax:**
+
+```sql
+CALL [catalog_name.]sys.remove_server_tag(
+ tabletServers => 'STRING',
+ serverTag => 'STRING'
+)
+```
+
+**Parameters:**
+
+- `tabletServers` (required): The TabletServer IDs to remove tag from. Can be
a single server ID (e.g., `'0'`) or multiple IDs separated by commas (e.g.,
`'0,1,2'`).
+- `serverTag` (required): The tag to remove from the TabletServers. Valid
values are:
+ - `'PERMANENT_OFFLINE'`: Remove the permanent offline tag from the
TabletServer.
+ - `'TEMPORARY_OFFLINE'`: Remove the temporary offline tag from the
TabletServer.
+
+**Returns:** An array with a single element `'success'` if the operation
completes successfully.
+
+**Example:**
+
+```sql title="Flink SQL"
+-- Use the Fluss catalog (replace 'fluss_catalog' with your catalog name if
different)
+USE fluss_catalog;
+
+-- Remove PERMANENT_OFFLINE tag from a single TabletServer
+CALL sys.remove_server_tag('0', 'PERMANENT_OFFLINE');
+
+-- Remove TEMPORARY_OFFLINE tag from multiple TabletServers
+CALL sys.remove_server_tag('1,2,3', 'TEMPORARY_OFFLINE');
+```
+
+### rebalance
+
+Trigger a rebalance operation to redistribute buckets across TabletServers in
the cluster. This procedure helps balance workload based on specified goals,
such as distributing replicas or leaders evenly across the cluster.
+
+**Syntax:**
+
+```sql
+CALL [catalog_name.]sys.rebalance(
+ priorityGoals => 'STRING'
+)
+```
+
+**Parameters:**
+
+- `priorityGoals` (required): The rebalance goals to achieve, specified as
goal types. Can be a single goal (e.g., `'REPLICA_DISTRIBUTION'`) or multiple
goals separated by commas (e.g., `'REPLICA_DISTRIBUTION,LEADER_DISTRIBUTION'`).
Valid goal types are:
+ - `'REPLICA_DISTRIBUTION'`: Generates replica movement tasks to ensure the
number of replicas on each TabletServer is near balanced.
+ - `'LEADER_DISTRIBUTION'`: Generates leadership movement and leader
replica movement tasks to ensure the number of leader replicas on each
TabletServer is near balanced.
+
+**Returns:** An array with a single element containing the rebalance ID (e.g.,
`'rebalance-12345'`), which can be used to track or cancel the rebalance
operation.
+
+**Important Notes:**
+
+- Multiple goals can be specified in priority order. The system will attempt
to achieve goals in the order specified.
+- Rebalance operations run asynchronously in the background. Use the returned
rebalance ID to monitor progress.
+- The rebalance operation respects server tags set by `add_server_tag`. For
example, servers marked with `PERMANENT_OFFLINE` will have their buckets
migrated away.
+
+**Example:**
+
+```sql title="Flink SQL"
+-- Use the Fluss catalog (replace 'fluss_catalog' with your catalog name if
different)
+USE fluss_catalog;
+
+-- Trigger rebalance with replica distribution goal
+CALL sys.rebalance('REPLICA_DISTRIBUTION');
+
+-- Trigger rebalance with multiple goals in priority order
+CALL sys.rebalance('REPLICA_DISTRIBUTION,LEADER_DISTRIBUTION');
+```
+
+### list_rebalance
+
+Query the progress and status of a rebalance operation. This procedure allows
you to monitor ongoing or completed rebalance operations to track their
progress and view detailed information about bucket movements.
+
+**Syntax:**
+
+```sql
+-- List the most recent rebalance progress
+CALL [catalog_name.]sys.list_rebalance()
+
+-- List a specific rebalance progress by ID
+CALL [catalog_name.]sys.list_rebalance(
+ rebalanceId => 'STRING'
+)
+```
+
+**Parameters:**
+
+- `rebalanceId` (optional): The rebalance ID to query. If omitted, returns the
progress of the most recent rebalance operation. The rebalance ID is returned
when calling the `rebalance` procedure.
+
+**Returns:** An array of strings containing:
+- Rebalance ID: The unique identifier of the rebalance operation
+- Rebalance total status: The overall status of the rebalance. Possible values
are:
+ - `NOT_STARTED`: The rebalance has been created but not yet started
+ - `REBALANCING`: The rebalance is currently in progress
+ - `COMPLETED`: The rebalance has successfully completed
+ - `FAILED`: The rebalance has failed
+ - `CANCELED`: The rebalance has been canceled
+- Rebalance progress: The completion percentage (e.g., `75.5%`)
+- Rebalance detail progress for bucket: Detailed progress information for each
bucket being moved
+
+If no rebalance is found, returns: `"No rebalance progress found."`
Review Comment:
Return empty lines would be better.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]