Luke Chen created KAFKA-14242:
---------------------------------
Summary: Hanging logManager in
testReloadUpdatedFilesWithoutConfigChange test
Key: KAFKA-14242
URL: https://issues.apache.org/jira/browse/KAFKA-14242
Project: Kafka
Issue Type: Test
Reporter: Luke Chen
Assignee: Luke Chen
Recently, we got a lot of build failed (and terminated) with core:unitTest
failure. The failed messages look like this:
FAILURE: Build failed with an exception.
[2022-09-14T09:51:52.190Z]
[2022-09-14T09:51:52.190Z] * What went wrong:
[2022-09-14T09:51:52.190Z] Execution failed for task ':core:unitTest'.
[2022-09-14T09:51:52.190Z] > Process 'Gradle Test Executor 128' finished with
non-zero exit value 1{{}}
After investigation, I found one reason of it (maybe there are other reasons).
In {{BrokerMetadataPublisherTest#testReloadUpdatedFilesWithoutConfigChange}}
test, we created logManager twice, but when cleanup, we only close one of them.
So, there will be a log cleaner keeping running. But during this time, the temp
log dirs are deleted, so it will {{{}Exit.halt(1){}}}, and got the error we saw
in gradle, like this code did when we encounter IOException in all our log dirs:
fatal(s"Shutdown broker because all log dirs in ${logDirs.mkString(", ")} have
failed")
Exit.halt(1){{}}
And, why does it sometimes pass, sometimes failed? Because during test cluster
close, we shutdown broker first, and then other components. And the log cleaner
is triggered in an interval. So, if the cluster can close fast enough, and
finish this test, it'll be passed. Otherwise, it'll exit with 1.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)