[
https://issues.apache.org/jira/browse/HDFS-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron T. Myers updated HDFS-4656:
---------------------------------
Attachment: HDFS-4656.patch
Trivial patch attached. To test this I added some extra logging both to the
heartbeat loop itself and inside the {{if}} block mentioned in the issue
description. Without the patch I saw output like this:
{noformat}
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:37 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:20:37 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_2854604910635513445_147059
13/04/01 17:20:38 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-7126490035827800141_147054
13/04/01 17:20:39 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-7998009805522854247_147131
13/04/01 17:20:40 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:40 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:40 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:40 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:20:40 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-7289451447776161689_147106
13/04/01 17:20:41 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-3895917754682539115_147098
13/04/01 17:20:42 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_1182110057394126316_147123
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:20:43 INFO datanode.DataNode: ====> actually performing heartbeat
{noformat}
With the patch I see output like this:
{noformat}
13/04/01 17:22:45 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:22:45 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:22:46 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-5664468450728548846_148227
13/04/01 17:22:47 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_5505984176030032691_148190
13/04/01 17:22:48 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-847450075981641452_148208
13/04/01 17:22:48 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:22:48 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:22:49 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-5755422789655200141_148209
13/04/01 17:22:50 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-4066581005605116272_148148
13/04/01 17:22:51 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-3643658690040746645_148232
13/04/01 17:22:51 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:22:51 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:22:52 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_7451729659994128063_148176
13/04/01 17:22:53 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_7633597047916157767_148158
13/04/01 17:22:54 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_8386436294983826146_148194
13/04/01 17:22:54 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:22:54 INFO datanode.DataNode: ====> actually performing heartbeat
13/04/01 17:22:55 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-4489059851090749136_148224
13/04/01 17:22:56 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-5331285809567165401_148151
13/04/01 17:22:57 INFO datanode.BlockPoolSliceScanner: Verification succeeded
for BP-193129671-172.21.3.106-1357591728674:blk_-4611051254925118595_148196
13/04/01 17:22:57 INFO datanode.DataNode: ====> heartbeat loop...
13/04/01 17:22:57 INFO datanode.DataNode: ====> actually performing heartbeat
{noformat}
No tests are included since the patch is trivial and would be difficult to
test in a non-convoluted manner.
> DN heartbeat loop can be briefly tight
> --------------------------------------
>
> Key: HDFS-4656
> URL: https://issues.apache.org/jira/browse/HDFS-4656
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.0.4-alpha
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
> Priority: Minor
> Attachments: HDFS-4656.patch
>
>
> The DN hearbeat loop looks roughly like this:
> {code}
> if (now - timeOfLastHeartbeat > configuredHeartbeatInterval) {
> // do heartbeat
> }
> timeToWait = configuredHeartbeatInterval - (now - timeOfLastHeartbeat)
> sleep(timeToWait)
> {code}
> The trouble is that since we sleep for exactly the heartbeat interval, and
> then check to see if we have waited _more_ than that heartbeat interval, we
> will very often have waited exactly the heartbeat interval (in millis), and
> not more than it. In this case we will skip actually performing the heartbeat
> and will calculcate timeToWait as being 0ms. The DN heartbeat loop will then
> loop tightly for 1ms. The solution is just to change the "{{>}}" in the code
> above to "{{>=}}".
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira