[ 
https://issues.apache.org/jira/browse/HDFS-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094015#comment-16094015
 ] 

Yongjun Zhang commented on HDFS-11799:
--------------------------------------

HI [~brahmareddy],

Thanks for the updated patch and really worry for the delayed review. It in 
general looks pretty good. I have some comments about cosmetics things:

1. Suggest to change
{code}
DFSClient.LOG.warn(
              "Failed to add a new datanode for write pipeline, minimum block "
                  + "replication:"
                  + dfsClient.dtpReplaceDatanodeOnFailureReplication
                  + ", good datanode size: " + nodes.length, ioe);
{code}
to
{code}
DFSClient.LOG.warn(
              "Failed to find a new datanode to add to the write pipeline, "
                  " continue to write to the pipeline with " + nodes.length
                  " nodes since it's bigger than minimum replication: "
                  + dfsClient.dtpReplaceDatanodeOnFailureReplication
                  + " configued by "
                  + 
HdfsClientConfigKeys.BlockWrite.ReplaceDatanodeOnFailure.REPLICATION,
                  + ".", ioe);
{code}

2. Seems we don't need to add two new entries and call them deprecated:
{code}
  @Deprecated
  public static final String 
DFS_CLIENT_WRITE_REPLACE_DATANODE_ON_FAILURE_REPLICATION =
      HdfsClientConfigKeys.BlockWrite.ReplaceDatanodeOnFailure.REPLICATION;
  @Deprecated
  public static final short 
DFS_CLIENT_WRITE_REPLACE_DATANODE_ON_FAILURE_REPLICATION_DEFAULT =
      
HdfsClientConfigKeys.BlockWrite.ReplaceDatanodeOnFailure.REPLICATION_DEFAULT;
{code}
So we can drop them right?

3. in hdfs-default.xml, what about change it to:
{code}
      The minimum number of replications that are needed to not to fail
      the write pipeline if new datanodes can not be found to replace
      failed datanodes (could be due to network failure) in the write pipeline.
      If the number of the remaining datanodes in the write pipeline is greater 
than or
      equal to this property value, continue writing to the remaining nodes.
      Otherwise throw exception.

      If this is set to 0 or a negative number, an exception will be thrown
      when a replacement can not be found.
      See also dfs.client.block.write.replace-datanode-on-failure.policy
{code}

4. Thanks for the comprehensive tests.  I notice that the comment of test code 
sometimes don't match the code, e.g.
{code}
54   /** Test fail all the datanodes except first in the pipeline. */
255   @Test public void testWithOnlyLastDatanodeIsAlive() throws Exception {
{code}

and
{code}
317   /** Test fail all the datanodes except first in the pipeline. */
318   @Test
319   public void 
testLessNumberOfLiveDatanodesThanWriteReplaceDatanodeOnFailureRF()
{code}

5. Possible to consolidate simliar part of the tests into one helper method 
with a parameter to indicate which 
DNs to stop, to avoid redundant code?


6. Seems even for the test that exception is thrown due to failure to find new 
datanode, the
result file can still be verified to have the expected full content with 
verifyFileContent()?


> Introduce a config to allow setting up write pipeline with fewer nodes than 
> replication factor
> ----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11799
>                 URL: https://issues.apache.org/jira/browse/HDFS-11799
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Yongjun Zhang
>            Assignee: Brahma Reddy Battula
>         Attachments: HDFS-11799-002.patch, HDFS-11799-003.patch, 
> HDFS-11799.patch
>
>
> During pipeline recovery, if not enough DNs can be found, if 
> dfs.client.block.write.replace-datanode-on-failure.best-effort
> is enabled, we let the pipeline to continue, even if there is a single DN.
> Similarly, when we create the write pipeline initially, if for some reason we 
> can't find enough DNs, we can have a similar config to enable writing with a 
> single DN.
> More study will be done.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to