Hi, @Mirko: Please file a JIRA. This seems an appropriate time.
@Steve: If we store the absolute filenames (i.e. the whole path), would we still have the problem you outlined in the 2nd para? I do agree updating would have to be pushed out and that might be cumbersome, but hey, we are processing heartbeats from the datanodes every 3 seconds anyway. Maybe we can piggyback those updates? I'm sure there are better solutions as well and I don't think these problems are show-stoppers. If this solutions helps to decrease the FUD, then I think it might be worth it (apart from its merit) Just my $.02 Ravi On Wed, Sep 28, 2011 at 9:06 AM, Steve Loughran <ste...@apache.org> wrote: > > One of the issues here is keeping that list up to date. You don't want > filename operations on the NN to push out changes to datanodes (which may > not be there, after all), and you don't necessarily want every block > creation operation on a DN to force an update on what effectively becomes a > mini-db of (filename, block) mappings. Yes, it could just be a text file, > but you still need to push out atomic updates which don't lose the previous > version on a power failure. That update would have to be thread safe, you > would have to decide whether to make it save-immediately vs lazy-write. > > In the situation in which your NN loses the entire image -and all its > backups- you are going to lose the directory tree. All the per-DN metadata > would do is leave you with some useful filenames (2011_09_22_EMEA_paying_* > *customers.csv.lzo) and lots that aren't (mapout0043.something). Someone > is still going to have to try and recreate what appears to be a functional > directory tree from it. Then once you add layers on top like HBase, life is > even more complicated as the filenames will stop bearing any relationship to > the content. > > I'd go for a process that makes checkpointing NN state more reliable. That > could include making it easier for the secondary namenode to push out > updates to worker nodes in the system that can store timestamped/version > stamped copies of the state; it could be improving recovery of state, and it > could be better code to make sure that the secondary Namenode is actually > working. Because you will need a secondary namenode on any cluster of > moderate size, and you will need to make sure it is working -and test it- > > > On 28/09/11 14:27, Ravi Prakash wrote: > >> Hi Mirko, >> >> Its seems like a great idea to me!! The architects and senior developers >> might have some more insight on this though. >> >> I think part of the reason why the community might be lazy about >> implementing this is because the Namenode being a single point of failure >> is >> usually regarded as FUD. There are simple tricks (like writing the fsimage >> and editslog to NFS) which can guard against some failure scenarios, and I >> think most users of hadoop are satisfied with that. >> >> I wouldn't be too surprised if there is already a JIRA for this. But if >> you >> could come up with a patch, I'm hopeful the community would be interested >> in >> it. >> >> Cheers >> Ravi >> >> 2011/9/27 Mirko >> Kämpf<mirko.kaempf@googlemail.**com<mirko.kae...@googlemail.com> >> > >> >> Hi, >>> during the Cloudera Developer Training at Berlin I came up with an idea, >>> regarding a lost name-node. >>> As in this case all data blocks are lost. The solution could be, to have >>> a >>> table which relates filenames and block_ids on that node, which can be >>> scaned >>> after a name-node is lost. Or on every block could be a kind of a >>> backlink >>> to the filename and the total nr of blocks and/or a total hashsum >>> attached. >>> This would it make easy to recover with minimal overhead. >>> >>> Now I would like to ask the developer community, if there is any good >>> reason >>> not to do this? >>> Before I start to figure out where to start an implementation of such a >>> feature. >>> >>> Thanks, >>> Mirko >>> >>> >> >