[ 
https://issues.apache.org/jira/browse/HDFS-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HDFS-10780:
--------------------------------------
    Attachment: HDFS-10780.001.patch

More details on the Issue 1:

*Problem:*
— After pipeline recovery (from data streaming failures), block replication to 
stale replicas are not happening
— TestDataNodeHotSwapVolumes fails with “TimeoutException: Timed out waiting 
for /test to reach 3 replicas” signature

*Analysis:*
— Assume write pipeline DN1 —> DN2 —> DN3
— For the {{UNDER_CONSTRUCTION}} Block, NameNode sets the *expected replicas* 
as DN1, DN2, DN3
— DN1 encounters a write issue (here the volume is removed while write is 
in-progress)
— Client detects pipeline issue, triggers pipeline recovery and gets the new 
write pipeline as DN2 —> DN3

— On a successful {{FSNameSystem::updatePipeline}} request from Client, 
NameNode bumps up the Generation Stamp (from 001 to 002) of the 
UnderConstruction (that is, the last) block of the file.
— All the current *expected replicas* are stale as they have lesser Generation 
Stamp compared to the new one after the pipeline update.
— NameNode resets *expected replicas* with the new set of storage ids from the 
update pipeline, which is {DN2, DN3}

— DNs send their Incremental Block Reports to NameNode. IBRs can have Blocks 
with old or new Generation Stamp. And these replica blocks can be in any states 
— FINALIZED, RBW, RBR, etc.,
— Assume, the stale replica DN1 sending IBR with the following
— — Replica Block State: RBW
— — Replica Block GS: 001 (stale)
— Assume, the good replica DN2, DN3 sending IBR with the following
— — Replica Block State: FINALIZED
— — Replica Block GS: 002 (good)


— {{BlockManager::processAndHandleReportedBlock}} when processing Incremental 
Block Reports, for Replica blocks in RBW/RBR states, NameNode does not check 
block Generation Stamps until the stored block is COMPLETE. Since the Block 
state at NN is still in UNDER_CONSTRUCTION, the *Stale RBW block from DN1 gets 
accepted*

— {{BlockManager::addStoredBlockUnderConstruction}} assumes the replica block 
from corrupt DN1 to be a good one and adds DN1’s StorageInfo to the expected 
replica locations. Refer: 
{{BlockUnderConstructionFeature::addReplicaIfNotPresent}}. Thus *expected 
replicas* again become (DN1, DN2, DN3).

— Later when the Client closes the file, {{FSNameSystem}} moves all the 
*expected replicas* to pendingReconstrcution. Refer: 
{{FSNameSystem::addComittedBlocksToPending}}

— {{BlockManager::checkRedundancy}} mistakenly believes pendingReconstruction 
count 1 (for DN1) is currently in-porgress and adding this to live replicas 
count 2 (for DN2, DN3), it decides no more reconstruction needed as it matches 
the configured replication factor of 3.

— Since there wasn’t any block reconstruction triggered for DN1, test times out 
waiting for the replication factor of 3. 


*Fix:*

— I believe the core issue here is in the processing of IBRs from stale 
replicas. Either 
— — (A) {{BlockManager::checkReplicaCorrupt}} has to tag the block as corrupt, 
when the replica state is RBW and when the block is not complete  OR
— — (B) {{BlockManager::addStoredBlockUnderConstruction}} should not ADD the 
corrupt replica in the *expected replicas* for the under construction block

Attached patch has the fix (B). Also, wrote a unit test to explicitly check for 
expected replica count under above line of events. 

[~eddyxu], [~andrew.wang], [~yzhangal] can you please take a look at the patch ?

> Block replication not proceeding after pipeline recovery -- 
> TestDataNodeHotSwapVolumes fails
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10780
>                 URL: https://issues.apache.org/jira/browse/HDFS-10780
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-10780.001.patch
>
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test 
> testRemoveVolumeBeingWrittenForDatanode.  Data write pipeline can have issues 
> as there could be timeouts, data node not reachable etc, and in this test 
> case it was more of induced one as one of the volumes in a datanode is 
> removed while block write is in progress. Digging further in the logs, when 
> the problem happens in the write pipeline, the error recovery is not 
> happening as expected leading to block replication never catching up.
> Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.serv
> testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes)
>   Time elapsed: 44.354 se
> java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 
> replicas
> Results :
> Tests in error: 
>   
> TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714
>  » Timeout
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
> Following exceptions are not expected in this test run
> {noformat}
>  614 2016-08-10 12:30:11,269 [DataXceiver for client 
> DFSClient_NONMAPREDUCE_-640082112_10 at /127.0.0.1:58805 [Receiving block 
> BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]] DEBUG 
> datanode.Da     taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number 
> of active connections is: 2
>  615 java.lang.IllegalMonitorStateException
>  616         at java.lang.Object.wait(Native Method)
>  617         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280)
>  618         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517)
>  619         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832)
>  620         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798)
> {noformat}
> {noformat}
>  720 2016-08-10 12:30:11,287 [DataNode: 
> [[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
>  [DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec     
> t/hadoop-hdfs/target/test/data/dfs/data/data2/]]  heartbeating to 
> localhost/127.0.0.1:58788] ERROR datanode.DataNode 
> (BPServiceActor.java:run(768)) - Exception in BPOfferService for Block pool 
> BP-18529     88604-172.16.3.66-1470857409044 (Datanode Uuid 
> 711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788
>  721 java.lang.NullPointerException
>  722         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841)
>  723         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336)
>  724         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624)
>  725         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766)
>  726         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to