Hey guys, Jaikiran posted a patch on KAFKA-1853 to improve the handling of failures during delete. https://issues.apache.org/jira/browse/KAFKA-1853
The core problem here is that we are doing File.rename() as part of the delete sequence which returns false if the rename failed. Or file delete sequence is something like the following: 1. Remove the file from the index so no new reads can begin on it 2. Rename the file to xyz.deleted so that if we crash it will get cleaned up 3. Schedule a task to delete the file in 30 seconds or so when any in-progress reads have likely completed. The goal here is to avoid errors on in progress reads but also avoid locking on all reads. The question is what to do when rename fails? Previously if this happened we actually didn't pay attention and would fail to delete the file entirely. This patch changes it so that if the rename fails we log an error and force an immediate delete. I think this is the right thing to do, but I guess the real question is why would rename fail? Some possibilities: http://stackoverflow.com/questions/2372374/why-would-a-file-rename-fail-in-java An alternative would be to treat this as a filesystem error and shutdown as we do elsewhere. Thoughts? -Jay