[ https://issues.apache.org/jira/browse/HDFS-14694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190977#comment-17190977 ]
Ayush Saxena commented on HDFS-14694: ------------------------------------- Thanx [~leosun08] for the update, Couple of comments : {code:java} + if (recoverOnCloseException) { + try { + dfsClient.endFileLease(fileId); + dfsClient.recoverLease(src); + leaseRecovered = true; + } catch (Exception e) { + // Ignore exception rendered by recoverLease. Throw original + // exception + } + } {code} * This can be refactored into a single method and can be used at both places, You can extend a javadoc for the new method, explaining what it does. {code:java} - private void completeFile() throws IOException { + protected void completeFile() throws IOException { {code} * Add {{@VisibleForTesting}} and I think default access, should be enough? {code:java} + public boolean isLeaseRecovered() { {code} * Similarly here. {code:java} + String RECOVER_ON_CLOSE_EXCEPTION_KEY = + PREFIX + "recover.on.close.exception"; {code} * Can we change to {{recover.lease.on.close.exception}}? {code:java} GenericTestUtils.waitFor(() -> { boolean closed; try { closed = cluster.getFileSystem().isFileClosed( new Path("/testExceptionInCloseWithoutRecoverLease")); } catch (IOException e) { return false; } return closed; }, 1000, 5000); } {code} * The wait can be refactored to a method, and you can use in both the tests. Nit: {code:java} + Assert.assertTrue(cluster.getFileSystem() + Assert.assertFalse(cluster.getFileSystem().isFileClosed( {code} * No need for {{Assert.}} {{assertTrue}} and {{assertFalse}} are already imported. > Call recoverLease on DFSOutputStream close exception > ---------------------------------------------------- > > Key: HDFS-14694 > URL: https://issues.apache.org/jira/browse/HDFS-14694 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Reporter: Chen Zhang > Assignee: Lisheng Sun > Priority: Major > Attachments: HDFS-14694.001.patch, HDFS-14694.002.patch, > HDFS-14694.003.patch, HDFS-14694.004.patch, HDFS-14694.005.patch, > HDFS-14694.006.patch, HDFS-14694.007.patch, HDFS-14694.008.patch, > HDFS-14694.009.patch, HDFS-14694.010.patch, HDFS-14694.011.patch > > > HDFS uses file-lease to manage opened files, when a file is not closed > normally, NN will recover lease automatically after hard limit exceeded. But > for a long running service(e.g. HBase), the hdfs-client will never die and NN > don't have any chances to recover the file. > Usually client program needs to handle exceptions by themself to avoid this > condition(e.g. HBase automatically call recover lease for files that not > closed normally), but in our experience, most services (in our company) don't > process this condition properly, which will cause lots of files in abnormal > status or even data loss. > This Jira propose to add a feature that call recoverLease operation > automatically when DFSOutputSteam close encounters exception. It should be > disabled by default, but when somebody builds a long-running service based on > HDFS, they can enable this option. > We've add this feature to our internal Hadoop distribution for more than 3 > years, it's quite useful according our experience. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org