[ 
https://issues.apache.org/jira/browse/HDFS-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120426#comment-14120426
 ] 

Colin Patrick McCabe commented on HDFS-6867:
--------------------------------------------

A few comments on the first version (pre-rebase):

{code}
+        DFSClient.LOG.debug("RecoveryWorker for block " + block + " running");
{code}

This should be INFO, and I would also include the client name and file name.

{code}
+//    private boolean atRisk = false;
...
+//          this.atRisk = true;
{code}

I think you meant to remove this.

Please don't use Thread.interrupt on the RecoveryWorker when closing.  It gets 
very messy (Thread.interrupt can close sockets and FDs that the thread is 
listening on, etc.)  Instead, use some kind of Atomic or condition variable to 
communicate with this thread.

{code}
+              underBackgroundRecovery = true;
+              addDatanode2ExistingPipeline();
+              pipelineRecovered = true;
+              underBackgroundRecovery = false;
{code}

This will leave you in a bad state if there is an exception or other unexpected 
control flow issue.

We should think about how to manage the communication between the writer thread 
and recovery thread.  There are a few canonical ways:
* locks plus condition variables
* AtomicReference with a state, implementing a state machine
* locks plus polling

volatile booleans are almost always a mistake.

> For DFSOutputStream, do pipeline recovery for a single block in the background
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-6867
>                 URL: https://issues.apache.org/jira/browse/HDFS-6867
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>            Reporter: Colin Patrick McCabe
>            Assignee: Zhe Zhang
>         Attachments: HDFS-6867-20140827-2.patch, HDFS-6867-20140827-3.patch, 
> HDFS-6867-20140827.patch, HDFS-6867-20140828-1.patch, 
> HDFS-6867-20140828-2.patch, HDFS-6867-design-20140820.pdf, 
> HDFS-6867-design-20140821.pdf, HDFS-6867-design-20140822.pdf, 
> HDFS-6867-design-20140827.pdf
>
>
> For DFSOutputStream, we should be able to do pipeline recovery in the 
> background, while the user is continuing to write to the file.  This is 
> especially useful for long-lived clients that write to an HDFS file slowly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to