[ 
https://issues.apache.org/jira/browse/KUDU-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xixu Wang updated KUDU-3447:
----------------------------
    Description: 
Copying tablets from an old cluster to another new cluster is a high resource 
consumed operation using the command : kudu local_replica copy_from_remote. As 
the follow picture shows: the usage of memory is as high as 75%. And the 
network is almost occupied fully (the overall network bandwidth is 2Gb/s). Disk 
reading is every high (the overall disk bandwidth is 200MB/s). 

!image-2023-02-09-10-47-58-370.png|width=996,height=369!

If the data size is very large, the copying process will last for a long time. 
Other service maybe get impacted and become unavailable. Therefore it is better 
to limit the tablets copying speed and make the system more stable. The goal is 
to balance the tablets copying speed and the impact to other services.

As copy_from_remote is mainly downloading data from the remote cluster and 
write the data to local file system, it is better to control the downloading 
speed to control the resource consumption. There are some algorithms to 
implement a rate limiter. This patch will use the token bucket algorithm 
implemented by Facebook Folly library: 
[https://github.com/facebook/folly/blob/main/folly/TokenBucket.h]

  was:
Copying tablets from an old cluster to another new cluster is a high resource 
consumed operation using the command : kudu local_replica copy_from_remote. As 
the follow picture shows: the usage of memory is as high as 75%. And the 
network is almost occupied fully (the overall network bandwidth is 2Gb/s). Disk 
reading is every high (the overall disk bandwidth is 200MB/s). 

!image-2023-02-09-10-47-58-370.png|width=996,height=369!

If the data size is very large, the copying process will last for a long time. 
Other service maybe get impacted and become unavailable. Therefore it is better 
to limit the tablets copying speed and make the system more stable. The goal is 
to balance the tablets copying speed and the impact to other services.

Maybe 


> Limit the usage of network bandwidth of tablet copying 
> -------------------------------------------------------
>
>                 Key: KUDU-3447
>                 URL: https://issues.apache.org/jira/browse/KUDU-3447
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Xixu Wang
>            Priority: Minor
>         Attachments: image-2023-02-09-10-38-50-512.png, 
> image-2023-02-09-10-47-58-370.png
>
>
> Copying tablets from an old cluster to another new cluster is a high resource 
> consumed operation using the command : kudu local_replica copy_from_remote. 
> As the follow picture shows: the usage of memory is as high as 75%. And the 
> network is almost occupied fully (the overall network bandwidth is 2Gb/s). 
> Disk reading is every high (the overall disk bandwidth is 200MB/s). 
> !image-2023-02-09-10-47-58-370.png|width=996,height=369!
> If the data size is very large, the copying process will last for a long 
> time. Other service maybe get impacted and become unavailable. Therefore it 
> is better to limit the tablets copying speed and make the system more stable. 
> The goal is to balance the tablets copying speed and the impact to other 
> services.
> As copy_from_remote is mainly downloading data from the remote cluster and 
> write the data to local file system, it is better to control the downloading 
> speed to control the resource consumption. There are some algorithms to 
> implement a rate limiter. This patch will use the token bucket algorithm 
> implemented by Facebook Folly library: 
> [https://github.com/facebook/folly/blob/main/folly/TokenBucket.h]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to