[ 
https://issues.apache.org/jira/browse/KAFKA-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516097#comment-14516097
 ] 

Jiangjie Qin commented on KAFKA-2139:
-------------------------------------

[~jjkoshy] that is a good point. There are tons of details needed but here is a 
brief idea.
What I'm thinking now is that from controller point of view, it will process 
one Event after another, if a new event cause some update, the controller does 
not need to fail/cancel the previous request but just send a new request to 
"override" the previous request.
For example if we have the following cluster:

Broker0            : t1-p0(leader)  ,                  t2-p0(follower), 
t2-p1(leader)

Broker1            : t1-p0(follower), t1-p1(leader)  ,                  
t2-p1(follower)

Broker2(Controller):                  t1-p1(follower), t2-p0(leader)

If controller reassign t2-p1 to broker2, the event will be:
1. PartitionReassigmentEvent -> 
*send LeaderAndIsrRequest to broker2 to make it become follower
                                 
2. PartitionIsrChangeEvent(Broker2 enters ISR) -> 
*send LeaderAndIsrRequest to broker1 make it become leader
*send LeaderAndIsrRequest to broker2 make it become follower
*send StopReplicaRequest to broker0 to stop replica
*send UpdateMetadataRequest

So event 1 and 2 are independent event. Suppose broker2 is down between them, 
the event sequence would become:
1. PartitionReassigmentEvent -> 
*send LeaderAndIsrRequest to broker2 and broker1 to make them become follower
*send LeaderAndIsrRequest to broker0 to make it become leader.
*send UpdateMetadataRequest
                                 
3. BrokerDownEvent(Broker2 is down) -> 
*Send LeaderAndIsr request to migrate all the leaders on broker 1 to other 
brokers.

So event 3 essentially overrides what ever request sent from previous event. In 
this case however, partition reassignment won't finish until broker2 come back.
The main idea here is all the alive brokers will received the up-to-date state 
from controller in order. If controller has correct state, the alive brokers 
should also be in good state.

> Add a separate controller messge queue with higher priority on broker side 
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-2139
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2139
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>
> This ticket is supposed to be working together with KAFKA-2029. 
> There are two issues with current controller to broker messages.
> 1. On the controller side the message are sent without synchronization.
> 2. On broker side the controller messages share the same queue as client 
> messages.
> The problem here is that brokers process the controller messages for the same 
> partition at different times and the variation could be big. This causes 
> unnecessary data loss and prolong the preferred leader election / controlled 
> shutdown/ partition reassignment, etc.
> KAFKA-2029 was trying to add a boundary between messages for different 
> partitions. For example, before leader migration for previous partition 
> finishes, the leader migration for next partition won't begin.
> This ticket is trying to let broker process controller messages faster. So 
> the idea is have separate queue to hold controller messages, if there are 
> controller messages, KafkaApi thread will first take care of those messages, 
> otherwise it will proceed messages from clients.
> Those two tickets are not ultimate solution to current controller problems, 
> but just mitigate them with minor code changes. Moving forward, we still need 
> to think about rewriting controller in a cleaner way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to