[
https://issues.apache.org/jira/browse/KUDU-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xixu Wang updated KUDU-3447:
----------------------------
Description:
Copying tablets from an old cluster to another new cluster is a high resource
consumed operation using the command : kudu local_replica copy_from_remote. As
the follow picture shows: the usage of memory is as high as 75%. And the
network is almost occupied fully (the overall network bandwidth is 2Gb/s). Disk
reading is every high (the overall disk bandwidth is 200MB/s).
!image-2023-02-09-10-47-58-370.png|width=996,height=369!
If the data size is very large, the copying process will last for a long time.
Other service maybe get impacted and become unavailable. Therefore it is better
to limit the tablets copying speed and make the system more stable. The goal is
to balance the tablets copying speed and the impact to other services.
Maybe
was:
Using the command: 'kudu local_replica copy_from_remote' to copy the tablets
from an old kudu cluster to another new kudu cluster, the network will be
occupied by this process. If the data size is every large, this process will
last for a long time. Other services like web service maybe get impacted and
lost data. The follow picture shows, almost all bandwidth is occupied by
copying process. Therefore it is better to limit the copying speed.
!https://doc.sensorsdata.cn/download/attachments/283384953/image2022-7-12_15-41-39.png?version=1&modificationDate=1657611700000&api=v2!
> Limit the usage of network bandwidth of tablet copying
> -------------------------------------------------------
>
> Key: KUDU-3447
> URL: https://issues.apache.org/jira/browse/KUDU-3447
> Project: Kudu
> Issue Type: Improvement
> Reporter: Xixu Wang
> Priority: Minor
> Attachments: image-2023-02-09-10-38-50-512.png,
> image-2023-02-09-10-47-58-370.png
>
>
> Copying tablets from an old cluster to another new cluster is a high resource
> consumed operation using the command : kudu local_replica copy_from_remote.
> As the follow picture shows: the usage of memory is as high as 75%. And the
> network is almost occupied fully (the overall network bandwidth is 2Gb/s).
> Disk reading is every high (the overall disk bandwidth is 200MB/s).
> !image-2023-02-09-10-47-58-370.png|width=996,height=369!
> If the data size is very large, the copying process will last for a long
> time. Other service maybe get impacted and become unavailable. Therefore it
> is better to limit the tablets copying speed and make the system more stable.
> The goal is to balance the tablets copying speed and the impact to other
> services.
> Maybe
--
This message was sent by Atlassian Jira
(v8.20.10#820010)