[jira] [Updated] (HDFS-14626) Decommission all nodes hosting last block of open file succeeds unexpectedly

Stephen O'Donnell (JIRA) Tue, 02 Jul 2019 13:56:21 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-14626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stephen O'Donnell updated HDFS-14626:
-------------------------------------
    Description: 
I have been investigating scenarios that cause decommission to hang, especially 
around one long standing issue. That is, an open block on the host which is 
being decommissioned can cause the process to never complete.

Checking the history, there seems to have been at least one change in HDFS-5579 
which greatly improved the situation, but from reading comments and support 
cases, there still seems to be some scenarios where open blocks on a DN host 
cause the decommission to get stuck.

No matter what I try, I have not been able to reproduce this, but I think I 
have uncovered another issue that may partly explain why.

If I do the following, the nodes will decommission without any issues:

1. Create a file and write to it so it crosses a block boundary. Then there is 
one complete block and one under construction block. Keep the file open, and 
write a few bytes periodically.

2. Now note the nodes which the UC block is currently being written on, and 
decommission them all.

3. The decommission should succeed.

4. Now attempt to close the open file, and it will fail to close with an error 
like below, probably as decommissioned nodes are not allowed to send IBRs:
{code:java}
java.io.IOException: Unable to close file because the last block 
BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have 
enough number of replicas.
    at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
    at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
    at 
org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
    at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
    at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101){code}
Interestingly, if you recommission the nodes without restarting them before 
closing the file, it will close OK, and writes to it can continue even once 
decommission has completed.

I don't think this is expected - ie decommission should not complete on all 
nodes hosting the last UC block of a file?

>From what I have figured out, I don't think UC blocks are considered in the 
>DatanodeAdminManager at all. This is because the original list of blocks it 
>cares about, are taken from the Datanode block Iterator, which takes them from 
>the DatanodeStorageInfo objects attached to the datanode instance. I believe 
>UC blocks don't make it into the DatanodeStoreageInfo until after they have 
>been completed and an IBR sent, so the decommission logic never considers them.

What troubles me about this explanation, is how did open files previously cause 
decommission to get stuck if it never checks for them, so I suspect I am 
missing something.

I will attach a patch with a test case that demonstrates this issue. This 
reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6 
branch, but with a lot of backports.

  was:
I have been investigating scenarios that cause decommission to hang, especially 
around one long standing issue. That is, an open block on the host which is 
being decommissioned can cause the process to never complete.

Checking the history, there seems to have been at least one change in HDFS-5579 
which greatly improved the situation, but from reading comments and support 
cases, there still seems to be some scenarios where open blocks on a DN host 
cause the decommission to get stuck.

No matter what I try, I have not been able to reproduce this, but I think I 
have uncovered another issue that may partly explain why.

If I do the following, the nodes will decommission without any issues:

1. Create a file and write to it so it crosses a block boundary. Then there is 
one complete block and one under construction block. Keep the file open, and 
write a few bytes periodically.

2. Now note the nodes which the UC block is currently being written on, and 
decommission them all.

3. The decommission should succeed.

4. Now attempt to close the open file, and it will fail to close with an error 
like below, probably as decommissioned nodes are not allowed to send IBRs:

{code}
java.io.IOException: Unable to close file because the last block 
BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have 
enough number of replicas.
    at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
    at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
    at 
org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
    at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
    at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
    \{code}

Interestingly, if you recommission the nodes without restarting them before 
closing the file, it will close OK, and writes to it can continue even once 
decommission has completed.

I don't think this is expected - ie decommission should not complete on all 
nodes hosting the last UC block of a file?

>From what I have figured out, I don't think UC blocks are considered in the 
>DatanodeAdminManager at all. This is because the original list of blocks it 
>cares about, are taken from the Datanode block Iterator, which takes them from 
>the DatanodeStorageInfo objects attached to the datanode instance. I believe 
>UC blocks don't make it into the DatanodeStoreageInfo until after they have 
>been completed and an IBR sent, so the decommission logic never considers them.

What troubles me about this explanation, is how did open files previously cause 
decommission to get stuck if it never checks for them, so I suspect I am 
missing something.

I will attach a patch with a test case that demonstrates this issue. This 
reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6 
branch, but with a lot of backports.


> Decommission all nodes hosting last block of open file succeeds unexpectedly 
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-14626
>                 URL: https://issues.apache.org/jira/browse/HDFS-14626
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> I have been investigating scenarios that cause decommission to hang, 
> especially around one long standing issue. That is, an open block on the host 
> which is being decommissioned can cause the process to never complete.
> Checking the history, there seems to have been at least one change in 
> HDFS-5579 which greatly improved the situation, but from reading comments and 
> support cases, there still seems to be some scenarios where open blocks on a 
> DN host cause the decommission to get stuck.
> No matter what I try, I have not been able to reproduce this, but I think I 
> have uncovered another issue that may partly explain why.
> If I do the following, the nodes will decommission without any issues:
> 1. Create a file and write to it so it crosses a block boundary. Then there 
> is one complete block and one under construction block. Keep the file open, 
> and write a few bytes periodically.
> 2. Now note the nodes which the UC block is currently being written on, and 
> decommission them all.
> 3. The decommission should succeed.
> 4. Now attempt to close the open file, and it will fail to close with an 
> error like below, probably as decommissioned nodes are not allowed to send 
> IBRs:
> {code:java}
> java.io.IOException: Unable to close file because the last block 
> BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have 
> enough number of replicas.
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
>     at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
>     at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>     at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101){code}
> Interestingly, if you recommission the nodes without restarting them before 
> closing the file, it will close OK, and writes to it can continue even once 
> decommission has completed.
> I don't think this is expected - ie decommission should not complete on all 
> nodes hosting the last UC block of a file?
> From what I have figured out, I don't think UC blocks are considered in the 
> DatanodeAdminManager at all. This is because the original list of blocks it 
> cares about, are taken from the Datanode block Iterator, which takes them 
> from the DatanodeStorageInfo objects attached to the datanode instance. I 
> believe UC blocks don't make it into the DatanodeStoreageInfo until after 
> they have been completed and an IBR sent, so the decommission logic never 
> considers them.
> What troubles me about this explanation, is how did open files previously 
> cause decommission to get stuck if it never checks for them, so I suspect I 
> am missing something.
> I will attach a patch with a test case that demonstrates this issue. This 
> reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6 
> branch, but with a lot of backports.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-14626) Decommission all nodes hosting last block of open file succeeds unexpectedly

Reply via email to