[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146250#comment-15146250
 ] 

Neha Narkhede commented on KAFKA-1464:
--------------------------------------

The most useful resource to throttle for is network bandwidth usage by 
replication, as measured by the rate of total outgoing replication data on 
every leader. Adding the ability on every leader to cap data transferred under 
an upper limit is what we are looking for. This can be a config option similar 
to the one we have for the log cleaner. It seems to be that it is better to 
have the leader send less instead of have the replica fetch less as the leader 
has a holistic view of the total amount of data being transferred out.
Data transferred from a leader includes
- Fetch requests from an in-sync replica
- Fetch requests from an out-of-sync replica of a partition being reassigned
- Fetch requests from an out-of-sync replica of a partition not being reassigned
Data transferred across 1+2+3 should stay roughly within the configured upper 
limit. If the limit is crossed, we want to start throttling requests, all 
except the ones that fall under #1. The leader can assign the remaining 
available bandwidth amongst partitions that fall under #2 and #3 by allowing 
more bandwidth to #3 since presumably it is fine to let partitions being 
reassigned to catch up slower than the rest. Throttling could involve returning 
fewer bytes as determined by this computation for each such partition as Jay 
suggests.

> Add a throttling option to the Kafka replication tool
> -----------------------------------------------------
>
>                 Key: KAFKA-1464
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1464
>             Project: Kafka
>          Issue Type: New Feature
>          Components: replication
>    Affects Versions: 0.8.0
>            Reporter: mjuarez
>            Assignee: Ismael Juma
>            Priority: Minor
>              Labels: replication, replication-tools
>             Fix For: 0.9.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to