[
https://issues.apache.org/jira/browse/HDFS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315034#comment-17315034
]
Tak-Lon (Stephen) Wu commented on HDFS-7924:
--------------------------------------------
{quote}
{code:java}
if (uc.getNumExpectedLocations() == 0 && uc.getNumBytes() == 0) {
{code}
{quote}
Hi guys, I found this JIRA and have a different situation. where the the
`getNumExpectedLocations` pointed to two datanodes that has been terminated but
the size of the block is 0.
The background is that, there was a HDFS client create a file for appending but
didn't write anything to it, and unfortunately, we force kill `kill -9` the JVM
that hosts that HDFS client hold the lease and the blocks were considered to be
place on those datanodes. Few seconds later, the datanodes that hosts the
blocked decommissioned/shutdown normally.
As a result , the file reopen and the block(s) was assigned to those two dead
hosts, and the file lease cannot be recovered indefinitely.
So, I wanna make a code to check of `ExpectedStorageLocations` are not alive,
and then we can close the files (no matter what the block are marked as 0
length and I think the data loss already there.)
{code:java}
if ((uc.getNumExpectedLocations() == 0 && lastBlock.getNumBytes() == 0)
|| checkExpectedNodesAlive) {
// checkExpectedNodesAlive is the new logic for checking if the block were
assigned to datanodes that are not alive now.
{code}
May I asked if blocks should be replicated for open file ? what if the file has
not be written and the actual file size/length is really 0?
and what do you guys think about my proposal ?
> NameNode goes into infinite lease recovery
> ------------------------------------------
>
> Key: HDFS-7924
> URL: https://issues.apache.org/jira/browse/HDFS-7924
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client, namenode
> Affects Versions: 2.6.0
> Reporter: Arpit Agarwal
> Assignee: Yi Liu
> Priority: Major
>
> We encountered an HDFS lease recovery issue. All DataNodes+NameNodes were
> restarted while a client was running. A block was created on the NN but it
> had not yet been created on DNs. The NN tried to recover the lease for the
> block on restart but was unable to do so getting into an infinite loop.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]