ableegoldman commented on a change in pull request #9380:
URL: https://github.com/apache/kafka/pull/9380#discussion_r501392266
##########
File path:
streams/src/main/java/org/apache/kafka/streams/processor/internals/ProcessorStateManager.java
##########
@@ -603,7 +603,11 @@ public void checkpoint() {
try {
checkpointFile.write(checkpointingOffsets);
} catch (final IOException e) {
- log.warn("Failed to write offset checkpoint file to [{}]",
checkpointFile, e);
+ log.warn("Failed to write offset checkpoint file to [{}]." +
+ " This may occur if OS cleaned the state.dir in case when it
located in /tmp directory." +
+ " You can change location for state.dir to resolve problem." +
+ " This can also occur due to running multiple instances on the
same machine using the same state dir.",
Review comment:
@mjsax you can't run multiple instances on the same machine with the
same state.dir. For one thing, the locking mechanism is per-process*. If you
run two different instances then you can get an active task on one instance and
the corresponding standby on another. They would each think they owned the lock
for that task directory, and concurrently access it (leading to the
FileNotFoundException if one of them deletes the checkpoint, for example)
*on some systems. On others it isn't, but then you hit the opposite problem
where a task is deadlocked because the other process grabbed the lock for its
directory first
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]