[ 
https://issues.apache.org/jira/browse/KAFKA-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513511#comment-14513511
 ] 

Gwen Shapira commented on KAFKA-2139:
-------------------------------------

Thanks for the nice doc. Two questions / comments:

1. I'm working on the reuse of o.a.k.common code in KAFKA-1928. Right now I'm 
re-using Selector but not NetworkClient itself. Since we need a NetworkServer :)
I assume you are suggesting on reusing NetworkClient instead of BlockingChannel 
but not in SocketServer itself?

2. One issue I ran into is that the Controller is using the same Acceptor and 
Processors as the rest of the broker code. When we start the broker, we start 
accepting messages when the Controller is up (for obvious reasons), but the 
rest of the broker may take a bit longer (LogManager, ReplicaManager). Since 
the network code is up already, we start receiving requests from producers as 
well, for topics the broker didn't load into memory yet (and therefore don't 
know they exist). Its a weird state. I haven't seen it in a while, so perhaps 
it got resolved in another patch, but I was wondering if we can avoid these 
kind of issues by having the controller listen on a completely separate port 
with its own acceptor and processors. Then start accepting for Controller when 
controller is up and for broker when the rest is up.
This will also let [~toddpalino] run the inter-broker communication on a 
different network.
What do you think?

> Add a separate controller messge queue with higher priority on broker side 
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-2139
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2139
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>
> This ticket is supposed to be working together with KAFKA-2029. 
> There are two issues with current controller to broker messages.
> 1. On the controller side the message are sent without synchronization.
> 2. On broker side the controller messages share the same queue as client 
> messages.
> The problem here is that brokers process the controller messages for the same 
> partition at different times and the variation could be big. This causes 
> unnecessary data loss and prolong the preferred leader election / controlled 
> shutdown/ partition reassignment, etc.
> KAFKA-2029 was trying to add a boundary between messages for different 
> partitions. For example, before leader migration for previous partition 
> finishes, the leader migration for next partition won't begin.
> This ticket is trying to let broker process controller messages faster. So 
> the idea is have separate queue to hold controller messages, if there are 
> controller messages, KafkaApi thread will first take care of those messages, 
> otherwise it will proceed messages from clients.
> Those two tickets are not ultimate solution to current controller problems, 
> but just mitigate them with minor code changes. Moving forward, we still need 
> to think about rewriting controller in a cleaner way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to