[jira] [Commented] (HDFS-8763) After file closed, a race condition between IBR of 3rd replica of lastBlock and ReplicationMonitor

Hadoop QA (JIRA) Thu, 27 Aug 2015 02:21:07 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716339#comment-14716339
 ]


Hadoop QA commented on HDFS-8763:
---------------------------------

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 51s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 33s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 40s | The patch appears to introduce 4 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 16s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  67m 21s | Tests failed in hadoop-hdfs. |
| | | 110m 17s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.TestLeaseRecovery |
| Timed out tests | org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752681/HDFS-8763.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4cbbfa2 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12158/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12158/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12158/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12158/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12158/console |


This message was automatically generated.

> After file closed, a race condition between IBR of 3rd replica of lastBlock 
> and ReplicationMonitor
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8763
>                 URL: https://issues.apache.org/jira/browse/HDFS-8763
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: HDFS
>    Affects Versions: 2.4.0
>            Reporter: jiangyu
>            Assignee: Walter Su
>            Priority: Minor
>         Attachments: HDFS-8763.01.patch
>
>
> -For our cluster, the NameNode is always very busy, so for every incremental 
> block report , the contention of lock is heavy.-
> -The logic of incremental block report is as follow, client send block to dn1 
> and dn1 mirror to dn2, dn2 mirror to dn3. After finish this block, all 
> datanode will report the newly received block to namenode. In NameNode side, 
> all will go to the method processIncrementalBlockReport in BlockManager 
> class. But the status of the block reported from dn2,dn3 is RECEIVING_BLOCK, 
> for dn1 is RECEIED_BLOCK. It is okay if dn2, dn3 report before dn1(that is 
> common), but in some busy environment, it is easy to find dn1 report before 
> dn2 or dn3, let’s assume dn2 report first, dn1 report second, and dn3 report 
> third.-
> -So dn1 will addStoredBlock and find the replica of this block is not reach 
> the the original number(which is 3), and the block will add to 
> neededReplications construction and soon ask some node in pipeline (dn1 or 
> dn2)to replica it dn4 . After sometime, dn4 and dn3 all report this block, 
> then choose one node to invalidate.-
> Here is one log i found in our cluster:
> {noformat}
> 2015-07-08 01:05:34,675 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocateBlock: 
> /logs/***_bigdata_spam/logs/application_1435099124107_470749/xx.xx.4.62_45454.tmp.
>  BP-1386326728-xx.xx.2.131-1382089338395 
> blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-a7c0f8f6-2399-4980-9479-efa08487b7b3:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-c75145a0-ed63-4180-87ee-d48ccaa647c5:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW]]}
> 2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: xx.xx.7.75:50010 is added to 
> blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]}
>  size 0
> 2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: xx.xx.4.62:50010 is added to 
> blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]}
>  size 0
> 2015-07-08 01:05:35,003 INFO BlockStateChange: BLOCK* ask xx.xx.4.62:50010 to 
> replicate blk_3194502674_2121080184 to datanode(s) xx.xx.4.65:50010
> 2015-07-08 01:05:35,403 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: xx.xx.7.73:50010 is added to blk_3194502674_2121080184 size 
> 67750
> 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: xx.xx.4.65:50010 is added to blk_3194502674_2121080184 size 
> 67750
> 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
> blk_3194502674_2121080184 to xx.xx.7.75:50010
> 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (xx.xx.7.75:50010, blk_3194502674_2121080184) is added to invalidated blocks 
> set
> 2015-07-08 01:05:35,852 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> InvalidateBlocks: ask xx.xx.7.75:50010 to delete [blk_3194502674_2121080184, 
> blk_3194497594_2121075104]
> {noformat}
> Some day, the number of this situation can be 400000, that is not good for 
> the performance and waste network band.
> Our base version is hadoop 2.4 and i have checked hadoop 2.7.1 didn’t find 
> any difference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8763) After file closed, a race condition between IBR of 3rd replica of lastBlock and ReplicationMonitor

Reply via email to