kezhuw commented on code in PR #2141: URL: https://github.com/apache/zookeeper/pull/2141#discussion_r1758154068
########## zookeeper-server/src/main/java/org/apache/zookeeper/server/persistence/FileTxnLog.java: ########## @@ -701,14 +708,27 @@ public long getStorageSize() { /** * go to the next logfile + * * @return true if there is one and false if there is no * new file to be read - * @throws IOException */ private boolean goToNextLog() throws IOException { - if (storedFiles.size() > 0) { + if (!storedFiles.isEmpty()) { this.logFile = storedFiles.remove(storedFiles.size() - 1); - ia = createInputArchive(this.logFile); + try { + ia = createInputArchive(this.logFile); + } catch (EOFException ex) { + // If this file is the last log file in the database and is empty, + // it means that the last time the file was created + // before the header was written. + if (storedFiles.isEmpty() && this.logFile.length() == 0) { + boolean deleted = this.logFile.delete(); Review Comment: > Delete failures are usually due to permissions or disk failure. For example, the zookeeper user only has read access to the data directory, so deleting the empty log file will fail. 1) If the disk completes the I/O in less time than the I/O timeout, it may be retrying in the blk layer, just slower, but it won't fail. 2) When the disk I/O timeout occurs, most filesystems will remount as read-only, and zookeeper will suffer with insufficient permissions. 3) I/O may not always return properly if the disk is broken from my experience, the process is blocked in the uninterruptible state. When delete failed (filesystem readonly) and throw I/O exception, the server restarted, and failed again. If we didn't throw the exception, the server will fail on the next write with fsync (usually writing new epoch files after FLE completed and reaches initLimit timeout). This sounds sophisticated and complicated. It requires too much knowledge to be convinced. > If we didn't throw the exception, the server will fail on the next write with fsync (usually writing new epoch files after FLE completed and reaches initLimit timeout). The evaluation chain is much longer than throwing exception here. > I don't think we can do much more about the deletion failure. This is the point. So why not complain the failure in first place ? Throwing exception here is much more simple for us to be convinced correct than ignoring it and rely on any sophisticated experiences. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@zookeeper.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org