[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241774#comment-15241774
 ] 

Jason Ruckman commented on KAFKA-1464:
--------------------------------------

Hello Neha, 

One problem we've run into, is we run a system where sometimes we replace 
brokers completely, in an automated fashion, and rebalance leadership and 
replicas across them.  When we bring a new broker online, we move some 
partitions to it.  What we see is something like this:

Consider topics A, B, C with replication factors of 3
Consider brokers 1,2,3 as serving topics A,B,C

A new broker 4 is replacing 1 (maybe the machine died, or whatever)

A and B are relatively small, but C is large

1. Move some leaders and replicas to 4 for A and B from 2 and 3.  Everything is 
good up until now
2. Move some leaders and replicas to 4 for C from 2 and 3. 

At this point, broker 4 is pegged, since it's trying to pull in data from 2 and 
3 (the other two replicas) trying to catch up, so it causes timeouts for 
partitions it is the leader for.  Brokers 2 and 3 are ok because 4 can only use 
1/2 of their bandwidth to replicate, since they still have some bandwidth 
available to serve requests.

> Add a throttling option to the Kafka replication tool
> -----------------------------------------------------
>
>                 Key: KAFKA-1464
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1464
>             Project: Kafka
>          Issue Type: New Feature
>          Components: replication
>    Affects Versions: 0.8.0
>            Reporter: mjuarez
>            Assignee: Ismael Juma
>            Priority: Minor
>              Labels: replication, replication-tools
>             Fix For: 0.10.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to