NameNode - didn't persist the edit log

Guy Doulberg Thu, 15 Dec 2011 01:17:15 -0800

Hi guys,

We recently had the following problem  on our production cluster:


The filesystem containing the editlog and fsimage had no free inodes.

As a result the namenode wasn't able to obtain an inode for thefsimage and editlog after a checkpiot has been reached, while theprevious files were freed.Unfortunately, we had no monitoring on the inodes number, so ithappens that the namenode ran in this state for a few hours.


We have noticed this failure in its DFS-status page.

But the namenode didn't enter safe-mode, so all the writes were madecouldn't be persisted to the editlog.

After discovering the problem we freed inodes, and the file-systemseemed to be okay again, we tried to force the namenode to persist toeditlog with no success,

Eventually, we restarted the namenode -which of-course caused us to loseall the data that was written to the hdfs during these few hours(fortunately we have backup of the recent writes - so we restored thedata from there )


This situation raises some severe concerns,

1. How come the namenode identified a failure in persisting its editlogand didn't enter safe-mode? (The exception was given only a WARN-severity and not a CRITICAL)2. How come after we freed inodes, we couldn't persist the namenode?Maybe there should be a command in the CLI to should enable us to forcethe namenode to persist its editlog


Do you know of a JIRA opened for these issue, or should I open one?

Thanks Guy

NameNode - didn't persist the edit log

Reply via email to