[ 
https://issues.apache.org/jira/browse/KAFKA-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754771#comment-17754771
 ] 

Greg Harris commented on KAFKA-15343:
-------------------------------------

Hi [~prasanth] and thank you for reporting this issue! It is certainly not good 
that one test can cause the whole build to fail, preventing other tests from 
running.

Can you speak to the frequency that you've seen this failure? Naively I would 
expect that with >10000 ephemeral ports available, that such a failure would be 
quite rare.

If this is true, I don't think it is appropriate to disable these tests. They 
are extremely important test coverage for the MirrorMaker2 feature, and 
disabling them may lead to undetected regressions.

As far as resolving this issue, I think we should:

1. Find where we are leaking Kafka clients in the MM2 integration test suites, 
either within the framework or within the Mirror connectors.

2. Close Kafka clients in a timely fashion (some relevant work in 
https://issues.apache.org/jira/browse/KAFKA-14725 and 
https://issues.apache.org/jira/browse/KAFKA-15090 )

2. Try to reproduce the Gradle daemon crash in a more controlled environment

3. Report the daemon crash to the Gradle upstream



Since random port selection and port-reuse are standard procedures (not 
specific to Kafka) there could be downstream projects using Gradle that are 
affected. If there is something specific about the Kafka clients' connections 
that affect gradle, then we should investigate further to help the Gradle 
project resolve the issue.

> Fix MirrorConnectIntegrationTests causing ci build failures.
> ------------------------------------------------------------
>
>                 Key: KAFKA-15343
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15343
>             Project: Kafka
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 3.6.0
>            Reporter: Prasanth Kumar
>            Priority: Major
>
> There are several instances of tests interacting badly with gradle daemon(s) 
> running on ports that the kafka broker previously used. After going through 
> the debug logs we observed a few retrying kafka clients trying to connect to 
> broker which got shutdown and the gradle worker chose the same port on which 
> broker was running. Later in the build, the gradle daemon attempted to 
> connect to the worker and could not, triggering a failure. Ideally gradle 
> would not exit when connected to from an invalid client - in testing with 
> netcat, it would often handle these without dying. However there appear to be 
> some cases where the daemon dies completely. Both the broker code and the 
> gradle workers bind to port 0, resulting in the OS assigning it an unused 
> port. This does avoid conflicts, but does not ensure that long lived clients 
> do not attempt to connect to these ports afterwards. It's possible that 
> closing the client in between may be enough to work around this issue. Till 
> then we will disable the test to avoid the ci blocker from testing the code 
> changes.
> *MirrorConnectorsIntegrationBaseTest and extending Tests*
> {code:java}
> [2023-07-04T11:48:16.128Z] 2023-07-04T11:47:46.804+0000 [DEBUG] 
> [TestEventLogger] 
> MirrorConnectorsWithCustomForwardingAdminIntegrationTest > 
> testReplicateSourceDefault() STANDARD_OUT
> [2023-07-04T11:48:16.128Z] 2023-07-04T11:47:46.804+0000 [DEBUG] 
> [TestEventLogger]     [2023-07-04 11:47:46,799]
>  INFO primary REST service: http://localhost:43809/connectors 
> (org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest:224)
> [2023-07-04T11:48:16.128Z] 2023-07-04T11:47:46.804+0000 [DEBUG] 
> [TestEventLogger]     [2023-07-04 11:47:46,799] 
> INFO backup REST service: http://localhost:43323/connectors 
> (org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest:225)
> [2023-07-04T11:48:16.128Z] 2023-07-04T11:47:46.804+0000 [DEBUG] 
> [TestEventLogger]     [2023-07-04 11:47:46,799] 
> INFO primary brokers: localhost:37557 
> (org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest:226)
> [2023-07-04T11:59:12.968Z] 2023-07-04T11:59:12.900+0000 [DEBUG] 
> [org.gradle.internal.remote.internal.inet.TcpIncomingConnector] 
> Accepted connection from /127.0.0.1:47660 to /127.0.0.1:37557.
> [2023-07-04T11:59:13.233Z] 
> org.gradle.internal.remote.internal.MessageIOException: Could not read 
> message from '/127.0.0.1:47660'.
> [2023-07-04T11:59:12.970Z] 2023-07-04T11:59:12.579+0000 [DEBUG] 
> [org.gradle.internal.remote.internal.inet.TcpIncomingConnector] Listening on 
> [d6bf30cb-bca2-46d9-8aeb-b9fd0497f54d port:37557, 
> addresses:[localhost/127.0.0.1]].
> [2023-07-04T11:59:46.519Z] 2023-07-04T11:59:13.014+0000 [ERROR] 
> [system.err] org.gradle.internal.remote.internal.ConnectException: Could not 
> connect to server [d6bf30cb-bca2-46d9-8aeb-b9fd0497f54d port:37557, 
> addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1]. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to