[ 
https://issues.apache.org/jira/browse/HADOOP-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533201
 ] 

Konstantin Shvachko commented on HADOOP-1999:
---------------------------------------------

finalize removes hard links previously created by upgrade. The removal is done 
in a separate thread, but if there is a lot of blocks, 
then data-nodes are likely to be blocked on IOs, that is data transmission will 
be slow. This is what you observed here. 
A solution would be to remove the links lazily, e.g. remove 100 files per 
second or so. Then finalizing will go slower, but 
the data-nodes will be able to proceed with normal activities.

The jstack you attached: I do not see that data-node is doing any file deletes. 
Are you sure this thread dump was done 
during finalize? I see that one of the threads is doing DU though. Could the 
slowdown be related to HADOOP-1946?
Before this was fixed I've seen drastic slowdown of data-nodes, some of them 
would become dead even with insignificant load. 
Finalize would make things even worse.

Missing blocks: I suspect that you get these because many io operation were not 
complete. Some blocks were not replicated,
some files were not closed.

> DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-1999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1999
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0
>         Environment: Sep 14 nightly build
>            Reporter: Christian Kunz
>            Priority: Critical
>         Attachments: jstack.datanode
>
>
> I restarted namenode with -upgrade option, started a few scripts running 
> hadoop command line utility to upload a few files into dfs, and ran at some 
> time
> hadoop dfsadmin -finalizeUpgrade.
> At this time all the dfs clients I started before got stuck during block 
> transmission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to