[ 
https://issues.apache.org/jira/browse/KAFKA-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242461#comment-15242461
 ] 

Ewen Cheslack-Postava commented on KAFKA-3549:
----------------------------------------------

[~granthenke] [~ijuma] So, I verified that this doesn't clean up the transient 
failures, but committed as this cleanup was definitely worthwhile.

However, I also thought a bit more about how this could get triggered. I 
thought of 2 things:

1. We have some background threads that run consumer polling. If any of those 
are not shut down, they could continue connecting to any clusters that were 
given the same port and mess up partition assignment (if the groups use the 
same ID). I checked this out and it looks like they are all being shut down 
properly.
2. Some tests try to use fixed ports because they need to restart brokers and 
we can't guarantee tests will pass properly without a consistent set of ports 
to seed consumer metadata with. These tests need to override generateConfigs() 
to use FixedPortTestUtils. This was never guaranteed to work perfectly and it's 
basically just the solution we used given that we don't have a better solution 
for running these style tests when we can't guarantee certain resources (i.e. 
ports) will be available. In particular, while the broker is shutdown, at a 
minimum its possible that another test allocates the port and starts using it 
for its own broker. Depending on certain timeouts on tests and how long they 
all take to run, it's possible some assertions (e.g. those using 
TestUtils.waitUntilTrue()) might be able to pass even if another broker manages 
to bind the port and temporarily get in the way, and meanwhile the extra 
consumer connecting will interfere with the other test.

If FixedPortTestUtils *isn't* properly used, I think there are other ways 
things can fail too. For example, we might bind using port = 0, then restarting 
the broker will result in it listening on a different port, but another test 
can be given that same port and the consumers in the first test will connect to 
brokers in the second test. At a minimum, I think ProducerFailureHandlingTest, 
maybe BaseTopicMetadataTest.testIsrAfterBrokerShutDownAndJoinsBack, 
RollingBounceTest, UncleanLeaderElectionTest, 
ProducerTest.testSendWithDeadBroker, and probably more are buggy in that they 
do broker restarts but don't seem to use fixed ports.

I'm not sure why the consumer tests would be triggering assertions way more 
frequently than any other test. But given the way tests are run, it doesn't 
look likely to be an issue with state being held across multiple tests 
accidentally, and ports are a resource that we can't guarantee we hold onto 
across broker restarts (i.e. may not be valid across the entire test) but which 
consumers have set from their initialization such that we could potentially see 
cross-test contamination.

> Close consumers instantiated in consumer tests
> ----------------------------------------------
>
>                 Key: KAFKA-3549
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3549
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Grant Henke
>            Assignee: Grant Henke
>             Fix For: 0.10.1.0
>
>
> Close consumers instantiated in consumer tests. Since these consumers often 
> use the default group.id of "", they could cause transient failures like 
> those seen in KAFKA-3117 and KAFKA-2933. I have not been able to prove that 
> this change will fix those failures, but closing the consumers is a good 
> practice regardless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to