[
https://issues.apache.org/jira/browse/HDFS-14626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell updated HDFS-14626:
-------------------------------------
Description:
I have been investigating scenarios that cause decommission to hang, especially
around one long standing issue. That is, an open block on the host which is
being decommissioned can cause the process to never complete.
Checking the history, there seems to have been at least one change in HDFS-5579
which greatly improved the situation, but from reading comments and support
cases, there still seems to be some scenarios where open blocks on a DN host
cause the decommission to get stuck.
No matter what I try, I have not been able to reproduce this, but I think I
have uncovered another issue that may partly explain why.
If I do the following, the nodes will decommission without any issues:
1. Create a file and write to it so it crosses a block boundary. Then there is
one complete block and one under construction block. Keep the file open, and
write a few bytes periodically.
2. Now note the nodes which the UC block is currently being written on, and
decommission them all.
3. The decommission should succeed.
4. Now attempt to close the open file, and it will fail to close with an error
like below, probably as decommissioned nodes are not allowed to send IBRs:
{code:java}
java.io.IOException: Unable to close file because the last block
BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have
enough number of replicas.
at
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
at
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
at
org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101){code}
Interestingly, if you recommission the nodes without restarting them before
closing the file, it will close OK, and writes to it can continue even once
decommission has completed.
I don't think this is expected - ie decommission should not complete on all
nodes hosting the last UC block of a file?
>From what I have figured out, I don't think UC blocks are considered in the
>DatanodeAdminManager at all. This is because the original list of blocks it
>cares about, are taken from the Datanode block Iterator, which takes them from
>the DatanodeStorageInfo objects attached to the datanode instance. I believe
>UC blocks don't make it into the DatanodeStoreageInfo until after they have
>been completed and an IBR sent, so the decommission logic never considers them.
What troubles me about this explanation, is how did open files previously cause
decommission to get stuck if it never checks for them, so I suspect I am
missing something.
I will attach a patch with a test case that demonstrates this issue. This
reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6
branch, but with a lot of backports.
was:
I have been investigating scenarios that cause decommission to hang, especially
around one long standing issue. That is, an open block on the host which is
being decommissioned can cause the process to never complete.
Checking the history, there seems to have been at least one change in HDFS-5579
which greatly improved the situation, but from reading comments and support
cases, there still seems to be some scenarios where open blocks on a DN host
cause the decommission to get stuck.
No matter what I try, I have not been able to reproduce this, but I think I
have uncovered another issue that may partly explain why.
If I do the following, the nodes will decommission without any issues:
1. Create a file and write to it so it crosses a block boundary. Then there is
one complete block and one under construction block. Keep the file open, and
write a few bytes periodically.
2. Now note the nodes which the UC block is currently being written on, and
decommission them all.
3. The decommission should succeed.
4. Now attempt to close the open file, and it will fail to close with an error
like below, probably as decommissioned nodes are not allowed to send IBRs:
{code}
java.io.IOException: Unable to close file because the last block
BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have
enough number of replicas.
at
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
at
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
at
org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
\{code}
Interestingly, if you recommission the nodes without restarting them before
closing the file, it will close OK, and writes to it can continue even once
decommission has completed.
I don't think this is expected - ie decommission should not complete on all
nodes hosting the last UC block of a file?
>From what I have figured out, I don't think UC blocks are considered in the
>DatanodeAdminManager at all. This is because the original list of blocks it
>cares about, are taken from the Datanode block Iterator, which takes them from
>the DatanodeStorageInfo objects attached to the datanode instance. I believe
>UC blocks don't make it into the DatanodeStoreageInfo until after they have
>been completed and an IBR sent, so the decommission logic never considers them.
What troubles me about this explanation, is how did open files previously cause
decommission to get stuck if it never checks for them, so I suspect I am
missing something.
I will attach a patch with a test case that demonstrates this issue. This
reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6
branch, but with a lot of backports.
> Decommission all nodes hosting last block of open file succeeds unexpectedly
> -----------------------------------------------------------------------------
>
> Key: HDFS-14626
> URL: https://issues.apache.org/jira/browse/HDFS-14626
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.3.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
>
> I have been investigating scenarios that cause decommission to hang,
> especially around one long standing issue. That is, an open block on the host
> which is being decommissioned can cause the process to never complete.
> Checking the history, there seems to have been at least one change in
> HDFS-5579 which greatly improved the situation, but from reading comments and
> support cases, there still seems to be some scenarios where open blocks on a
> DN host cause the decommission to get stuck.
> No matter what I try, I have not been able to reproduce this, but I think I
> have uncovered another issue that may partly explain why.
> If I do the following, the nodes will decommission without any issues:
> 1. Create a file and write to it so it crosses a block boundary. Then there
> is one complete block and one under construction block. Keep the file open,
> and write a few bytes periodically.
> 2. Now note the nodes which the UC block is currently being written on, and
> decommission them all.
> 3. The decommission should succeed.
> 4. Now attempt to close the open file, and it will fail to close with an
> error like below, probably as decommissioned nodes are not allowed to send
> IBRs:
> {code:java}
> java.io.IOException: Unable to close file because the last block
> BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have
> enough number of replicas.
> at
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101){code}
> Interestingly, if you recommission the nodes without restarting them before
> closing the file, it will close OK, and writes to it can continue even once
> decommission has completed.
> I don't think this is expected - ie decommission should not complete on all
> nodes hosting the last UC block of a file?
> From what I have figured out, I don't think UC blocks are considered in the
> DatanodeAdminManager at all. This is because the original list of blocks it
> cares about, are taken from the Datanode block Iterator, which takes them
> from the DatanodeStorageInfo objects attached to the datanode instance. I
> believe UC blocks don't make it into the DatanodeStoreageInfo until after
> they have been completed and an IBR sent, so the decommission logic never
> considers them.
> What troubles me about this explanation, is how did open files previously
> cause decommission to get stuck if it never checks for them, so I suspect I
> am missing something.
> I will attach a patch with a test case that demonstrates this issue. This
> reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6
> branch, but with a lot of backports.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]