Hey guys,

Jaikiran posted a patch on KAFKA-1853 to improve the handling of failures
during delete.
https://issues.apache.org/jira/browse/KAFKA-1853

The core problem here is that we are doing File.rename() as part of the
delete sequence which returns false if the rename failed. Or file delete
sequence is something like the following:
1. Remove the file from the index so no new reads can begin on it
2. Rename the file to xyz.deleted so that if we crash it will get cleaned up
3. Schedule a task to delete the file in 30 seconds or so when any
in-progress reads have likely completed. The goal here is to avoid errors
on in progress reads but also avoid locking on all reads.

The question is what to do when rename fails? Previously if this happened
we actually didn't pay attention and would fail to delete the file
entirely. This patch changes it so that if the rename fails we log an error
and force an immediate delete.

I think this is the right thing to do, but I guess the real question is why
would rename fail? Some possibilities:
http://stackoverflow.com/questions/2372374/why-would-a-file-rename-fail-in-java

An alternative would be to treat this as a filesystem error and shutdown as
we do elsewhere.

Thoughts?

-Jay

Reply via email to