kowshik commented on a change in pull request #9680:
URL: https://github.com/apache/kafka/pull/9680#discussion_r540476157



##########
File path: core/src/main/scala/kafka/log/LogManager.scala
##########
@@ -479,14 +479,14 @@ class LogManager(logDirs: Seq[File],
 
     try {
       for ((dir, dirJobs) <- jobs) {
-        val hasErrors = dirJobs.map { future =>
+        val hasErrors = dirJobs.exists  { future =>
           Try(future.get) match {
             case Success(_) => false
             case Failure(e) =>
               warn(s"There was an error in one of the threads during 
LogManager shutdown: ${e.getCause}")
               true
           }
-        }.contains(true)
+        }

Review comment:
       @chia7712 @ijuma I think you were referring to 
https://github.com/apache/kafka/pull/9596 ?
   
   The test added in #9596 to verify this was 
`LogManagerTest.testHandlingExceptionsDuringShutdown` 
([link](https://github.com/apache/kafka/blob/dcbd28da53bfc6edb78fd1735993163fc6f4c7c7/core/src/test/scala/unit/kafka/log/LogManagerTest.scala#L94)).
 I agree that the test wasn't effective enough to catch the problem. The issue 
was that fundamentally this area of the code was a little bit difficult to test 
due to the asynchronous behavior.
   
   So for example, although @chia7712 has a fix in #9728, I still think it is 
not easy to write a test that reliably fails if the bug is introduced again.
   
   Below is my explanation on why it is difficult to improve the existing 
`testHandlingExceptionsDuringShutdown` to catch such issues. Let us imagine 
that the test setup introduced multiple logs for each log dir, and setup an 
expectation (in the test code) that if one of the logs failed during `close()`, 
then, the other logs under the same log directory should still be closed by the 
time `LogManager.shutdown()` returns. Such a kind of test expectatio would have 
caught the problem introduced in this PR. Now, writing that kind of a test was 
not easy, because, even if one of the logs failed to close, the other futures 
can still be completed by the time `LogManager.shutdown()` returns (there is no 
strong need to call `future.get` for the future to pass). Thus, the test can 
pass despite the bug.
   
   One potential solution (a bigger change) is to see if we can inject a 
different thread pool executor that only during test executes the threads 
lazily i.e. it executes the submitted threads only when `future.get` is called, 
otherwise it will never execute them. If we succeed in doing this, then, the 
test can be improved such that unless `future.get` is called on all logs, some 
of the log `close()` will never happen and will fail the test in the presence 
of the bug introduced by this PR.
   
   cc @junrao 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to