Todd Lipcon created HDFS-3599:
---------------------------------

             Summary: Better expose when under-construction files are 
preventing DN decommission
                 Key: HDFS-3599
                 URL: https://issues.apache.org/jira/browse/HDFS-3599
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: data-node, name-node
    Affects Versions: 3.0.0
            Reporter: Todd Lipcon


Filing on behalf of Konstantin Olchanski:
{quote}
I have been trying to decommission a data node, but the process
stalled. I followed the correct instructions, observed my node
listed in "Decommissioning Nodes", etc, observed "Under Replicated Blocks"
decrease, etc. But the count went down to "1" and the decommissin process 
stalled.
There was no visible activity anywhere, nothing was happening (well,
maybe in some hidden log file somewhere something complained,
but I did not look).

It turns out that I had some files stuck in "OPENFORWRITE" mode,
as reported by "hdfs fsck / -openforwrite -files -blocks -locations -racks":

{code}
/users/trinat/data/.fuse_hidden0000177e00000002 0 bytes, 0 block(s), 
OPENFORWRITE:  OK
/users/trinat/data/.fuse_hidden0000178d00000003 0 bytes, 0 block(s), 
OPENFORWRITE:  OK
/users/trinat/data/.fuse_hidden00001da300000004 0 bytes, 1 block(s), 
OPENFORWRITE:  OK
0. 
BP-88378204-142.90.119.126-1340494203431:blk_6980480609696383665_20259{blockUCState=UNDER_CONSTRUCTION,
 primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[142.90.111.72:50010|RBW], 
ReplicaUnderConstruction[142.90.119.162:50010|RBW], 
ReplicaUnderConstruction[142.90.119.126:50010|RBW]]} len=0 repl=3 
[/detfac/142.90.111.72:50010, /isac2/142.90.119.162:50010, 
/isac2/142.90.119.126:50010]
{code}

After I deleted those files, the decommission process completed successfully.

Perhaps one can add some visible indication somewhere on the HDFS status web 
page
that the decommission process is stalled and maybe report why it is stalled?

Maybe the number of "OPENFORWRITE" files should be listed on the status page
next to the "Number of Under-Replicated Blocks"? (Since I know that nobody is 
writing
to my HDFS, the non-zero count would give me a clue that something is wrong).
{quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to