[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...

ASF GitHub Bot (Jira) Tue, 07 Jun 2022 05:54:22 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16598?focusedWorklogId=779073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779073
 ]


ASF GitHub Bot logged work on HDFS-16598:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Jun/22 12:53
            Start Date: 07/Jun/22 12:53
    Worklog Time Spent: 10m 
      Work Description: Hexiaoqiao commented on PR #4366:
URL: https://github.com/apache/hadoop/pull/4366#issuecomment-1148627690

   > getReplicaInfo(ExtendedBlock b) will check gs, and getReplicaInfo(String 
bpid, long blkid) will not check the gs.
   
   @ZanderXu Thanks for the great catch here.
   
   > I would like to ask a question, after reading your discussion, is it 
possible that block GS of client may be smaller than DN appears in all places 
where getReplicaInfo(String bpid, long blkid) is called?
   
   It is good question. 
   IMO, it is not necessary to compare GS for any cases when get fine-grained 
lock for BLOCK_POOl or VOLUME, because both of them are not depended on block. 
Just suggest to improve them together in one PR.
   Thanks again.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 779073)
    Time Spent: 2h  (was: 1h 50m)

> All datanodes 
> [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]]
>  are bad. Aborting...
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16598
>                 URL: https://issues.apache.org/jira/browse/HDFS-16598
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> org.apache.hadoop.hdfs.testPipelineRecoveryOnRestartFailure failed with the 
> stack like:
> {code:java}
> java.io.IOException: All datanodes 
> [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]]
>  are bad. Aborting...
>       at 
> org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1667)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1601)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371)
>       at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674)
> {code}
> After tracing the root cause, this bug was introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. Because the 
> block GS of client may be smaller than DN when pipeline recovery failed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-16598) All datanodes [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] are bad. Aborting...

Reply via email to