[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180466#comment-14180466
 ] 

Colin Patrick McCabe commented on HDFS-7235:
--------------------------------------------

Hi Yongjun,

Thanks for your patience here.  I don't think the current patch is quite ready. 
 I could point to a few things, like this:  {{ReplicaInfo replicaInfo = 
(ReplicaInfo) data.getReplica(}}  We shouldn't be downcasting here.

I think the bigger issue is that the interface in FsDatasetSpi is just not very 
suitable to what we're trying to do.  Rather than trying to hack it, I think we 
should come up with a better interface.

I think we should replace {{FsDatasetSpi#isValid}} with this function:

{code}
  /**
   * Check if a block is valid.
   *
   * @param b           The block to check.
   * @param minLength   The minimum length that the block must have.  May be 0.
   * @param state       If this is null, it is ignored.  If it is non-null, we
   *                        will check that the replica has this state.
   *
   * @throws FileNotFoundException             If the replica is not found or 
there 
   *                                              was an error locating it.
   * @throws EOFException                      If the replica length is too 
short.
   * @throws UnexpectedReplicaStateException   If the replica is not in the 
   *                                             expected state.
   */
  public void checkBlock(ExtendedBlock b, long minLength, ReplicaState state);
{code}

Since this function will throw a clearly marked exception detailing which case 
we're in, we won't have to call multiple functions.  This will be better for 
performance since we're only taking the lock once.  This will also be better 
for clarity, since the current APIs lead to some rather complex code.

We could also get rid of {{FsDatasetSpi#isValidRbw}}, since this function can 
do everything that it can.
Also UnexpectedReplicaStateException could be a new exception under 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/UnexpectedReplicaStateException.java

I think it's fine to change FsDatasetSpi for this (we did it when adding 
caching stuff, and again when adding "trash").

Let me know what you think.  I think it would make things a lot more clear.

> Can not decommission DN which has invalid block due to bad disk
> ---------------------------------------------------------------
>
>                 Key: HDFS-7235
>                 URL: https://issues.apache.org/jira/browse/HDFS-7235
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.6.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
> HDFS-7235.003.patch
>
>
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on 
> the to-be-decommissioned DN to other DNs, it favors choosing this DN 
> to-be-decommissioned as the source of transfer (see BlockManager.java).  
> However, because of the bad disk, the DN would detect the source block to be 
> transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
>     final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
>         b.getLocalBlock());
>     return replicaInfo != null
>         && replicaInfo.getState() == state
>         && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is 
> because the block file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the 
> above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
> know that the block is corrupted, and keeps sending the data transfer request 
> to the same DN to be decommissioned, again and again. This caused an infinite 
> loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to