[
https://issues.apache.org/jira/browse/HDFS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816802#comment-13816802
]
Chris Nauroth commented on HDFS-5366:
-------------------------------------
I tested this patch and found that blocks were never uncaching. The NameNode
never sent DNA_UNCACHE messages to the DataNode. The reason is that there are
separate calls to {{DatanodeManager#getCacheCommand}} to get the DNA_CACHE set
followed by the DNA_UNCACHE set. The method internally resets the last message
time for the DataNode. This means that when it's time to send messages, the
first call for the DNA_CACHE messages succeeds and resets the clock for that
DataNode to right now. Then, the second call for the DNA_UNCACHE messages
always returns null, because it looks like it's not time to send messages.
To solve this, we need to set the DataNode's last caching directive sent time
just once, after calculating both the DNA_CACHE and DNA_UNCACHE commands. I
changed the code as follows to do this. Feel free to incorporate it into the
next patch. (I'm not uploading a new patch right now, because I don't want to
detangle it out of the HDFS-5394 patch applied in my environment.)
In {{DatanodeManager#handleHeartbeat}}:
{code}
long monoTimeMs = Time.monotonicNow();
if (sendCachingCommands) {
if ((monoTimeMs - nodeinfo.getLastCachingDirectiveSentTimeMs()) >=
timeBetweenResendingCachingDirectivesMs) {
DatanodeCommand pendingCacheCommand = getCacheCommand(
nodeinfo.getPendingCached(), nodeinfo,
DatanodeProtocol.DNA_CACHE, blockPoolId);
if (pendingCacheCommand != null) {
cmds.add(pendingCacheCommand);
}
DatanodeCommand pendingUncacheCommand = getCacheCommand(
nodeinfo.getPendingUncached(), nodeinfo,
DatanodeProtocol.DNA_UNCACHE, blockPoolId);
if (pendingUncacheCommand != null) {
cmds.add(pendingUncacheCommand);
}
nodeinfo.setLastCachingDirectiveSentTimeMs(monoTimeMs);
}
}
{code}
And {{DatanodeManager#getCacheCommand}}:
{code}
/**
* Convert a CachedBlockList into a DatanodeCommand with a list of blocks.
*
* @param list The {@link CachedBlocksList}. This function
* clears the list.
* @param datanode The datanode.
* @param action The action to perform in the command.
* @param poolId The block pool id.
* @return A DatanodeCommand to be sent back to the DN, or null if
* there is nothing to be done.
*/
private DatanodeCommand getCacheCommand(CachedBlocksList list,
DatanodeDescriptor datanode, int action, String poolId) {
int length = list.size();
if (length == 0) {
return null;
}
// Read and clear the existing cache commands.
long[] blockIds = new long[length];
int i = 0;
for (Iterator<CachedBlock> iter = list.iterator();
iter.hasNext(); ) {
CachedBlock cachedBlock = iter.next();
blockIds[i++] = cachedBlock.getBlockId();
iter.remove();
}
return new BlockIdCommand(action, poolId, blockIds);
}
{code}
I re-tested with these changes, and it worked.
> recaching improvements
> ----------------------
>
> Key: HDFS-5366
> URL: https://issues.apache.org/jira/browse/HDFS-5366
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: HDFS-4949
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-5366-caching.001.patch
>
>
> There are a few things about our HDFS-4949 recaching strategy that could be
> improved.
> * We should monitor the DN's maximum and current mlock'ed memory consumption
> levels, so that we don't ask the DN to do stuff it can't.
> * We should not try to initiate caching on stale or decomissioning DataNodes
> (although we should not recache things stored on such nodes until they're
> declared dead).
> * We might want to resend the {{DNA_CACHE}} or {{DNA_UNCACHE}} command a few
> times before giving up. Currently, we only send it once.
--
This message was sent by Atlassian JIRA
(v6.1#6144)