[
https://issues.apache.org/jira/browse/HDFS-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Suresh Srinivas updated HDFS-2815:
----------------------------------
Target Version/s: 0.24.0, 0.23.2 (was: 0.23.2, 0.24.0)
Affects Version/s: 1.1.0
> Namenode is not coming out of safemode when we perform ( NN crash + restart )
> . Also FSCK report shows blocks missed.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-2815
> URL: https://issues.apache.org/jira/browse/HDFS-2815
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.22.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
> Reporter: Uma Maheswara Rao G
> Assignee: Uma Maheswara Rao G
> Priority: Critical
> Attachments: HDFS-2815.patch
>
>
> When tested the HA(internal) with continuous switch with some 5mins gap,
> found some *blocks missed* and namenode went into safemode after next switch.
>
> After the analysis, i found that this files already deleted by clients.
> But i don't see any delete commands logs namenode log files. But namenode
> added that blocks to invalidateSets and DNs deleted the blocks.
> When restart of the namenode, it went into safemode and expecting some
> more blocks to come out of safemode.
> Here the reason could be that, file has been deleted in memory and added
> into invalidates after this it is trying to sync the edits into editlog file.
> By that time NN asked DNs to delete that blocks. Now namenode shuts down
> before persisting to editlogs.( log behind)
> Due to this reason, we may not get the INFO logs about delete, and when we
> restart the Namenode (in my scenario it is again switch), Namenode expects
> this deleted blocks also, as delete request is not persisted into editlog
> before.
> I reproduced this scenario with bedug points. *I feel, We should not add
> the blocks to invalidates before persisting into Editlog*.
> Note: for switch, we used kill -9 (force kill)
> I am currently in 0.20.2 version. Same verified in 0.23 as well in normal
> crash + restart scenario.
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira