ableegoldman commented on a change in pull request #9380:
URL: https://github.com/apache/kafka/pull/9380#discussion_r501392266



##########
File path: 
streams/src/main/java/org/apache/kafka/streams/processor/internals/ProcessorStateManager.java
##########
@@ -603,7 +603,11 @@ public void checkpoint() {
         try {
             checkpointFile.write(checkpointingOffsets);
         } catch (final IOException e) {
-            log.warn("Failed to write offset checkpoint file to [{}]", 
checkpointFile, e);
+            log.warn("Failed to write offset checkpoint file to [{}]." +
+                " This may occur if OS cleaned the state.dir in case when it 
located in /tmp directory." +
+                " You can change location for state.dir to resolve problem." +
+                " This can also occur due to running multiple instances on the 
same machine using the same state dir.",

Review comment:
       @mjsax you can't run multiple instances on the same machine with the 
same state.dir. For one thing, the locking mechanism is per-process*. If you 
run two different instances then you can get an active task on one instance and 
the corresponding standby on another. They would each think they owned the lock 
for that task directory, and concurrently access it (leading to the 
FileNotFoundException if one of them deletes the checkpoint, for example)
   
   *on some systems. On others it isn't, but then you hit the opposite problem 
where a task is deadlocked because the other process grabbed the lock for its 
directory first




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to