[ 
https://issues.apache.org/jira/browse/HDFS-16985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713408#comment-17713408
 ] 

ASF GitHub Bot commented on HDFS-16985:
---------------------------------------

smarthanwang commented on PR #5564:
URL: https://github.com/apache/hadoop/pull/5564#issuecomment-1512506592

   > If there's only one replica, and AWS EBS fails, doesn't it mean the file 
is not recoverable anyway?
   
   1. The EBS not really failed, it may recover soon.
   2. The EBS really failed, we can recover it manually, the data woludn't lost 
.




> delete local block file when FileNotFoundException occurred may lead to 
> missing block.
> --------------------------------------------------------------------------------------
>
>                 Key: HDFS-16985
>                 URL: https://issues.apache.org/jira/browse/HDFS-16985
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Chengwei Wang
>            Assignee: Chengwei Wang
>            Priority: Major
>              Labels: pull-request-available
>
> We encounterd several missing-block problem in our production cluster which  
> hdfs  running on AWS EC2 + EBS.
> The root cause:
>  # the block remains only 1 replication left and hasn't been reconstruction
>  # DN checks block file existing when BlockSender construction
>  # the EBS checking failed and throw FileNotFoundException (EBS may be in 
> fault condition)
>  # DN invalidateBlock and schedule block  async deletion
>  # EBS already back to normal when DN do delete block
>  # the block file be delete permanently and can't be recovered



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to