[ https://issues.apache.org/jira/browse/KAFKA-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855170#comment-17855170 ]
Greg Harris commented on KAFKA-16967: ------------------------------------- I looked into the IllegalStateException more to see if it was worth preventing. I think this appears in the tests because Linux allows fast port reuse on loopback adapters: [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=79e9fed460385a3d8ba0b5782e9e74405cb199b1] This shouldn't happen in the wild, as the 2MSL (4 minute) timeout for port reuse should be much longer than any race condition in the code, and it would probably be more complicated to remove the registration state before closing the underlying transport. I think this ticket can stand alone, and we don't need to change the double-registration behavior of the Selector. > NioEchoServer fails to register connection and causes flaky failure > ------------------------------------------------------------------- > > Key: KAFKA-16967 > URL: https://issues.apache.org/jira/browse/KAFKA-16967 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Greg Harris > Assignee: TengYao Chi > Priority: Minor > Labels: flaky-test, newbie > > The NioEchoServer calls Selector#register for new connections. This call can > throw exceptions, which then kill the NioEchoServer. This has been observed > in the SslTransportLayerTest testUngracefulRemoteCloseDuringHandshake* > methods. > {noformat} > Exception in thread "echoserver" java.lang.IllegalStateException: There is > already a connection for id 127.0.0.1:40007-127.0.0.1:43710 > at > org.apache.kafka.common.network.Selector.ensureNotRegistered(Selector.java:322) > at org.apache.kafka.common.network.Selector.register(Selector.java:310) > at > org.apache.kafka.common.network.NioEchoServer.run(NioEchoServer.java:229){noformat} > This causes the test to fail with essentially a timeout, when the connection > is expired for becoming idle unexpectedly: > {noformat} > org.opentest4j.AssertionFailedError: Unexpected channel state EXPIRED ==> > expected: <true> but was: <false> > at > org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > at > org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) > at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) > at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214) > at > org.apache.kafka.common.network.SslTransportLayerTest.testIOExceptionsDuringHandshake(SslTransportLayerTest.java:898) > at > org.apache.kafka.common.network.SslTransportLayerTest.testUngracefulRemoteCloseDuringHandshakeRead(SslTransportLayerTest.java:837){noformat} > Instead, the NioEchoServer should handle exceptions from register in a > similar fashion to the SocketServer. -- This message was sent by Atlassian Jira (v8.20.10#820010)