I'm evaluating whether the KafkaStreams API will be something we can use on my 
current project. Namely, we want to be able to distribute the consumers on a 
Mesos/YARN cluster. It's not entirely clear to me in the code what is deciding 
which partitions get assigned at runtime and whether this is intended for a 
distributed application or just a multi-threaded environment. 
I get that the consumer coordinator will get reassignments when group 
participation changes; however, in looking through the StreamPartitionAssignor 
code, it's not clear to me what is happening in the assign method. It looks 
like to me like subscriptions are coming in from the consumer coordinator, 
presumably whose assignments are derived from the lead brokers for the topics 
of interest. Those subscriptions are then translated into co-partitioned groups 
of clients. Once that's complete, it hands off the co-partitioned groups to the 
StreamThread's partitionGrouper to do the work of assigning the partitions to 
each co-partitioned group. The DefaultPartitionGrouper code, starting on line 
57, simply does a 1-up assigning of partition to group. How will this actually 
work with distributed stream consumers if it's always going to be assigning the 
partition as a 1-up sequence local to that particular consumer? Shouldn't it 
use the assigned partition that is coming back from the ConsumerCoordinator? 
I'm struggling to understand the layers but I need to in order to know whether 
this implementation is going to work for us. If the PartitionGroupAssignor's 
default is just meant for single-node multithreaded use, that's fine as long as 
I can inject my own implementation. But I would still need to understand what 
is happening at the StreamPartitionAssignor layer more clearly. Any info, 
design docs, in-progress wiki's would be most appreciated if the answer is too 
in-depth for an email discussion. Thanks and I really love what you guys are 
doing with this!
Mike

Reply via email to