[
https://issues.apache.org/jira/browse/HDFS-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947735#comment-14947735
]
Hadoop QA commented on HDFS-9137:
---------------------------------
\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch | 18m 23s | Pre-patch trunk compilation is
healthy. |
| {color:green}+1{color} | @author | 0m 0s | The patch does not contain any
@author tags. |
| {color:red}-1{color} | tests included | 0m 1s | The patch doesn't appear
to include any new or modified tests. Please justify why no new tests are
needed for this patch. Also please list what manual steps were performed to
verify this patch. |
| {color:green}+1{color} | javac | 7m 53s | There were no new javac warning
messages. |
| {color:green}+1{color} | javadoc | 10m 35s | There were no new javadoc
warning messages. |
| {color:red}-1{color} | release audit | 0m 17s | The applied patch generated
1 release audit warnings. |
| {color:red}-1{color} | checkstyle | 1m 23s | The applied patch generated 1
new checkstyle issues (total was 142, now 142). |
| {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that
end in whitespace. |
| {color:green}+1{color} | install | 1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with
eclipse:eclipse. |
| {color:green}+1{color} | findbugs | 2m 31s | The patch does not introduce
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native | 3m 11s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 189m 28s | Tests failed in hadoop-hdfs. |
| | | 235m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestRecoverStripedFile |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL |
http://issues.apache.org/jira/secure/attachment/12765432/HDFSS-9137.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 99e5204 |
| Release Audit |
https://builds.apache.org/job/PreCommit-HDFS-Build/12838/artifact/patchprocess/patchReleaseAuditProblems.txt
|
| checkstyle |
https://builds.apache.org/job/PreCommit-HDFS-Build/12838/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
|
| hadoop-hdfs test log |
https://builds.apache.org/job/PreCommit-HDFS-Build/12838/artifact/patchprocess/testrun_hadoop-hdfs.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HDFS-Build/12838/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/12838/console |
This message was automatically generated.
> DeadLock between DataNode#refreshVolumes and
> BPOfferService#registrationSucceeded
> ----------------------------------------------------------------------------------
>
> Key: HDFS-9137
> URL: https://issues.apache.org/jira/browse/HDFS-9137
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.0.0, 2.7.1
> Reporter: Uma Maheswara Rao G
> Assignee: Uma Maheswara Rao G
> Attachments: HDFS-9137.00.patch,
> HDFS-9137.01-WithPreservingRootExceptions.patch, HDFSS-9137.02.patch
>
>
> I can see this code flows between DataNode#refreshVolumes and
> BPOfferService#registrationSucceeded could cause deadLock.
> In practice situation may be rare as user calling refreshVolumes at the time
> DN registration with NN. But seems like issue can happen.
> Reason for deadLock:
> 1) refreshVolumes will be called with DN lock and after at the end it will
> also trigger Block report. In the Block report call,
> BPServiceActor#triggerBlockReport calls toString on bpos. Here it takes
> readLock on bpos.
> DN lock then boos lock
> 2) BPOfferSetrvice#registrationSucceeded call is taking writeLock on bpos and
> calling dn.bpRegistrationSucceeded which is again synchronized call on DN.
> bpos lock and then DN lock.
> So, this can clearly create dead lock.
> I think simple fix could be to move triggerBlockReport call outside out DN
> lock and I feel that call may not be really needed inside DN lock.
> Thoughts?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)