One of the issues here is keeping that list up to date. You don't want
filename operations on the NN to push out changes to datanodes (which
may not be there, after all), and you don't necessarily want every block
creation operation on a DN to force an update on what effectively
becomes a mini-db of (filename, block) mappings. Yes, it could just be a
text file, but you still need to push out atomic updates which don't
lose the previous version on a power failure. That update would have to
be thread safe, you would have to decide whether to make it
save-immediately vs lazy-write.
In the situation in which your NN loses the entire image -and all its
backups- you are going to lose the directory tree. All the per-DN
metadata would do is leave you with some useful filenames
(2011_09_22_EMEA_paying_customers.csv.lzo) and lots that aren't
(mapout0043.something). Someone is still going to have to try and
recreate what appears to be a functional directory tree from it. Then
once you add layers on top like HBase, life is even more complicated as
the filenames will stop bearing any relationship to the content.
I'd go for a process that makes checkpointing NN state more reliable.
That could include making it easier for the secondary namenode to push
out updates to worker nodes in the system that can store
timestamped/version stamped copies of the state; it could be improving
recovery of state, and it could be better code to make sure that the
secondary Namenode is actually working. Because you will need a
secondary namenode on any cluster of moderate size, and you will need to
make sure it is working -and test it-
On 28/09/11 14:27, Ravi Prakash wrote:
Hi Mirko,
Its seems like a great idea to me!! The architects and senior developers
might have some more insight on this though.
I think part of the reason why the community might be lazy about
implementing this is because the Namenode being a single point of failure is
usually regarded as FUD. There are simple tricks (like writing the fsimage
and editslog to NFS) which can guard against some failure scenarios, and I
think most users of hadoop are satisfied with that.
I wouldn't be too surprised if there is already a JIRA for this. But if you
could come up with a patch, I'm hopeful the community would be interested in
it.
Cheers
Ravi
2011/9/27 Mirko Kämpf<mirko.kae...@googlemail.com>
Hi,
during the Cloudera Developer Training at Berlin I came up with an idea,
regarding a lost name-node.
As in this case all data blocks are lost. The solution could be, to have a
table which relates filenames and block_ids on that node, which can be
scaned
after a name-node is lost. Or on every block could be a kind of a backlink
to the filename and the total nr of blocks and/or a total hashsum attached.
This would it make easy to recover with minimal overhead.
Now I would like to ask the developer community, if there is any good
reason
not to do this?
Before I start to figure out where to start an implementation of such a
feature.
Thanks,
Mirko