[ 
https://issues.apache.org/jira/browse/KAFKA-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147635#comment-17147635
 ] 

Joost van de Wijgerd commented on KAFKA-9953:
---------------------------------------------

Hi [~bchen225242] , 

I agree that we are not using the recommended usage pattern. The issue however 
is that the pattern we are using is actually fully functional (we have been 
using this in production for 9 months now) but due to the implementation detail 
of the TransactionManager that only supports one GroupCoordinator it keeps on 
'flipping' between group coordinators and due to the default retry timeout a 
100ms time penalty is incurred every time this happens (since we have 
discovered this issue we have set the timeout to 0ms). I have actually patched 
kafka-clients 2.5.0 with my fix and we are currently running this in production 
with no issues whatsoever. 

As to your point of the consumer groups rebalancing; correct me if I am wrong 
but I think this has no impact in the location of the ConsumerGroupCoordinator 
on the Broker. My fix is merely keeping track of which Broker hosts a given 
ConsumerGroupCoordinator so I don't see how this would be an issue.

I do agree with you that if you use the the many -> one mapping that you 
cannot/should not use the automatic rebalancing. We are indeed using our own 
assignment strategy because we want partitions of different Topics with the 
same ordinal to map to the same application instance. If we would let Kafka do 
the allocation I indeed think this pattern would not work correctly. 

I am sticking to my standpoint that implementing this improvement does not hurt 
the current recommended pattern at all but it does support the many to one 
pattern in a performant way. I don't think you have to update your 
documentation for this unless you want to specifically point this out to your 
users. 

If you decide to not implement this improvement I would opt to log a WARN 
message that alerts the developer to this issue so they can fix the problem in 
an early stage of development (currently there is an INFO message when a new 
ConsumerGroupCoordinator is found, this was my only clue to finding the problem 
and unfortunately this was after we implemented our framework around the many 
consumer > one producer concept)

To answer your question: implementing a proper switch to the one to one 
Consumer Producer mapping would be a big change for us, pairing extra Producers 
to our existing Consumers should be a lot easier but we would essentially be 
using them to implement the Map of ConsumerGroupCoordinators so for me it is 
then a better option to run with a patched kafka-clients library. However on 
the long run this is also not very sustainable. 

Best Regards,
Joost

> support multiple consumerGroupCoordinators in TransactionManager
> ----------------------------------------------------------------
>
>                 Key: KAFKA-9953
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9953
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients
>    Affects Versions: 2.5.0
>            Reporter: Joost van de Wijgerd
>            Priority: Major
>         Attachments: KAFKA-9953.patch
>
>
> We are using kafka with a transactional producer and have the following use 
> case:
> 3 KafkaConsumers (each with their own ConsumerGroup) polled by the same 
> thread and 1 transactional kafka producer. When we add the offsets to the 
> transaction we run into the following problem: 
> TransactionManager only keeps track of 1 consumerGroupCoordinator, however it 
> can be that some consumerGroupCoordinators are on another node, now we 
> constantly see the TransactionManager switching between nodes, this has 
> overhead of 1 failing _TxnOffsetCommitRequest_ and 1 unnecessary 
> _FindCoordinatorRequest_.
> Also with  _retry.backoff.ms_ set to 100 by default this is causing a pause 
> of 100ms for every other transaction (depending on what KafkaConsumer 
> triggered the transaction of course)
> If the TransactionManager could keep track of coordinator nodes per 
> consumerGroupId this problem would be solved. 
> I have already a patch for this but still need to test it. Will add it to the 
> ticket when that is done



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to