[
https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190243#comment-15190243
]
Hadoop QA commented on HDFS-9874:
---------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s {color} | {color:green} The patch appears to include 1 new or modified test
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m
11s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 54m 40s
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.8.0_74.
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 23s {color}
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
20s {color} | {color:green} Patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 133m 4s {color}
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.TestHFlush |
| | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12792577/HDFS-9874-trunk-2.patch
|
| JIRA Issue | HDFS-9874 |
| Optional Tests | asflicense compile javac javadoc mvninstall mvnsite
unit findbugs checkstyle |
| uname | Linux 761ed470b5bb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh
|
| git revision | trunk / 79961ec |
| Default Java | 1.7.0_95 |
| Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_74
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 |
| findbugs | v3.0.0 |
| whitespace |
https://builds.apache.org/job/PreCommit-HDFS-Build/14783/artifact/patchprocess/whitespace-eol.txt
|
| unit |
https://builds.apache.org/job/PreCommit-HDFS-Build/14783/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
|
| unit test logs |
https://builds.apache.org/job/PreCommit-HDFS-Build/14783/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
|
| JDK v1.7.0_95 Test Results |
https://builds.apache.org/job/PreCommit-HDFS-Build/14783/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U:
hadoop-hdfs-project/hadoop-hdfs |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/14783/console |
| Powered by | Apache Yetus 0.2.0 http://yetus.apache.org |
This message was automatically generated.
> Long living DataXceiver threads cause volume shutdown to block.
> ---------------------------------------------------------------
>
> Key: HDFS-9874
> URL: https://issues.apache.org/jira/browse/HDFS-9874
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.7.0
> Reporter: Rushabh S Shah
> Assignee: Rushabh S Shah
> Priority: Critical
> Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch,
> HDFS-9874-trunk.patch
>
>
> One of the failed volume shutdown took 3 days to complete.
> Below are the relevant datanode logs while shutting down a volume (due to
> disk failure)
> {noformat}
> 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing
> failed volume volumeA/current:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not
> writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized
> at
> org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194)
> at
> org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing
> scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23)
> 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO
> datanode.VolumeScanner: VolumeScanner(volumeA,
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting.
> 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN
> datanode.VolumeScanner: VolumeScanner(volumeA,
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b.
> java.io.FileNotFoundException:
> volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp
> (Read-only file system)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.save(FsVolumeImpl.java:669)
> at
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.saveBlockIterator(VolumeScanner.java:314)
> at
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633)
> 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to
> delete old dfsUsed file in
> volumeA/current/BP-1788428031-nnIp-1351700107344/current
> 2016-02-24 16:05:53,286 [Thread-49277] WARN impl.FsDatasetImpl: Failed to
> write dfsUsed to
> volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed
> java.io.FileNotFoundException:
> volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed (Read-only
> file system)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.saveDfsUsed(BlockPoolSlice.java:247)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.shutdown(BlockPoolSlice.java:698)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:815)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:328)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:250)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-24 16:05:53,286 [Thread-49277] INFO impl.FsDatasetImpl: Removed
> volume: volumeA/current
> 2016-02-24 16:05:53,286 [Thread-49277] WARN impl.FsDatasetImpl: Completed
> checkDirs. Found 1 failure volumes.
> 2016-02-24 16:05:53,286 [Thread-49277] INFO datanode.DataNode: Deactivating
> volumes (clear failure=false): volumeA
> 2016-02-24 16:05:53,286 [Thread-49277] INFO impl.FsDatasetImpl: Removing
> volumeA from FsDataset.
> 2016-02-24 16:05:53,342 [Thread-49277] INFO common.Storage: Removing block
> level storage: volumeA/current/BP-1788428031-nnIp-1351700107344
> 2016-02-24 16:05:53,345 [Thread-49277] WARN datanode.DataNode:
> DataNode.handleDiskError: Keep Running: true
> {noformat}
> The datanode waits for the reference count to become zero before shutting
> down the volume.
> {code:title=FsVolumeImpl.java|borderStyle=solid}
> while (this.reference.getReferenceCount() > 0) {
> if (FsDatasetImpl.LOG.isDebugEnabled()) {
> FsDatasetImpl.LOG.debug(String.format(
> "The reference count for %s is %d, wait to be 0.",
> this, reference.getReferenceCount()));
> }
> try {
> Thread.sleep(SLEEP_MILLIS);
> } catch (InterruptedException e) {
> throw new IOException(e);
> }
> }
> {code}
> Just before datanode logged the following line,
> {noformat}
> 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to
> delete old dfsUsed file in
> volumeA/current/BP-1788428031-nnIp-1351700107344/current
> {noformat}
> I saw the following stack trace
> {noformat}
> 2016-02-24 16:05:53,017 [DataXceiver for client
> DFSClient_NONMAPREDUCE_1651663681_1 at /upStreamDNIp:55710 [Receiving block
> BP-1788428031-nnIp-1351700107344:blk_7059462432_1109832821906]] INFO
> datanode.DataNode: Exception for
> BP-1788428031-nnIp-1351700107344:blk_7059462432_1109832909736
> java.io.IOException: Premature EOF from inputStream
> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:895)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:801)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-24 16:05:53,018 [PacketResponder:
> BP-1788428031-nnIp-1351700107344:blk_7059462432_1109832909736,
> type=LAST_IN_PIPELINE, downstreams=0:[]] INFO datanode.DataNode:
> PacketResponder:
> BP-1788428031-nnIp-1351700107344:blk_7059462432_1109832909736,
> type=LAST_IN_PIPELINE, downstreams=0:[]: Thread is interrupted.
> 2016-02-24 16:05:53,018 [PacketResponder:
> BP-1788428031-nnIp-1351700107344:blk_7059462432_1109832909736,
> type=LAST_IN_PIPELINE, downstreams=0:[]] INFO datanode.DataNode:
> PacketResponder:
> BP-1788428031-nnIp-1351700107344:blk_7059462432_1109832909736,
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2016-02-24 16:05:53,018 [DataXceiver for client
> DFSClient_NONMAPREDUCE_1651663681_1 at /upStreamDNIp:55710 [Receiving block
> BP-1788428031-nnIp-1351700107344:blk_7059462432_1109832821906]] INFO
> datanode.DataNode: opWriteBlock
> BP-1788428031-nnIp-1351700107344:blk_7059462432_1109832909736 received
> exception java.io.IOException: Prematur
> e EOF from inputStream
> 2016-02-24 16:05:53,018 [DataXceiver for client
> DFSClient_NONMAPREDUCE_1651663681_1 at /upStreamDNIp:55710 [Receiving block
> BP-1788428031-nnIp-1351700107344:blk_7059462432_1109832821906]] ERROR
> datanode.DataNode: thisDNName:1004:DataXceiver error processing WRITE_BLOCK
> operation src: /upStreamDNIp:55710 dst: /thisDNIp
> :1004
> java.io.IOException: Premature EOF from inputStream
> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:895)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:801)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> On tracking the block blk_7059462432_1109832821906, I see that the block was
> created on 2016-02-17 15:06:28,256
> {noformat}
> 2016-02-17 15:06:28,928 [DataXceiver for client
> DFSClient_NONMAPREDUCE_1651663681_1 at /upStreamDNIp:55590 [Receiving block
> BP-1788428031-nnIp-1351700107344:blk_7059462432_1109832821906]] INFO
> datanode.DataNode: Receiving
> BP-1788428031-nnIp-1351700107344:blk_7059462432_1109832821906 src:
> /upStreamDNIp:55590 dest: /thisDNIp:1004
> {noformat}
> The job which created this file was running for more than 7 days and the
> client eventually failed to renew the delegation token so the lease manager
> failed to renew lease for this file.
> Once commitBlockSynchronization kicked in, it tried to recover the block and
> eventually the DataXceiver thread died decrementing the reference count.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)