[ 
https://issues.apache.org/jira/browse/HDFS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212333#comment-16212333
 ] 

Jiandan Yang  commented on HDFS-7060:
-------------------------------------

[~xinwei] [~brahmareddy] [~jojochuang]  We encountered the same 
problem(branch-2.8.2), BPServiceActor#offerService blocked because 
sendHeartBeat waited for FSDataset lock, and blockReceivedAndDeleted was delay 
about 60s, and eventually  client can not close file and threw Exception 
"Unable to close file because the last blockxxx does not have enough number of 
replicas”

I think HDFS-7060 can solve our problem very well. Does this patch have any 
problem? Why does it merge into trunk.

> Avoid taking locks when sending heartbeats from the DataNode
> ------------------------------------------------------------
>
>                 Key: HDFS-7060
>                 URL: https://issues.apache.org/jira/browse/HDFS-7060
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haohui Mai
>            Assignee: Xinwei Qin 
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7060-002.patch, HDFS-7060.000.patch, 
> HDFS-7060.001.patch
>
>
> We're seeing the heartbeat is blocked by the monitor of {{FsDatasetImpl}} 
> when the DN is under heavy load of writes:
> {noformat}
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:115)
>         - waiting to lock <0x0000000780304fb8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:91)
>         - locked <0x0000000780612fd8> (a java.lang.Object)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:563)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:668)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:827)
>         at java.lang.Thread.run(Thread.java:744)
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:743)
>         - waiting to lock <0x0000000780304fb8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:169)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
>         at java.lang.Thread.run(Thread.java:744)
>    java.lang.Thread.State: RUNNABLE
>         at java.io.UnixFileSystem.createFileExclusively(Native Method)
>         at java.io.File.createNewFile(File.java:1006)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:59)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:244)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:195)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:753)
>         - locked <0x0000000780304fb8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:169)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
>         at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to