Hi,

@Mirko: Please file a JIRA. This seems an appropriate time.

@Steve: If we store the absolute filenames (i.e. the whole path), would we
still have the problem you outlined in the 2nd para? I do agree updating
would have to be pushed out and that might be cumbersome, but hey, we are
processing heartbeats from the datanodes every 3 seconds anyway. Maybe we
can piggyback those updates? I'm sure there are better solutions as well and
I don't think these problems are show-stoppers. If this solutions helps to
decrease the FUD, then I think it might be worth it (apart from its merit)

Just my $.02
Ravi




On Wed, Sep 28, 2011 at 9:06 AM, Steve Loughran <ste...@apache.org> wrote:

>
> One of the issues here is keeping that list up to date. You don't want
> filename operations on the NN to push out changes to datanodes (which may
> not be there, after all), and you don't necessarily want every block
> creation operation on a DN to force an update on what effectively becomes a
> mini-db of (filename, block) mappings. Yes, it could just be a text file,
> but you still need to push out atomic updates which don't lose the previous
> version on a power failure. That update would have to be thread safe, you
> would have to decide whether to make it save-immediately vs lazy-write.
>
> In the situation in which your NN loses the entire image -and all its
> backups- you are going to lose the directory tree. All the per-DN metadata
> would do is leave you with some useful filenames (2011_09_22_EMEA_paying_*
> *customers.csv.lzo) and lots that aren't (mapout0043.something). Someone
> is still going to have to try and recreate what appears to be a functional
> directory tree from it. Then once you add layers on top like HBase, life is
> even more complicated as the filenames will stop bearing any relationship to
> the content.
>
> I'd go for a process that makes checkpointing NN state more reliable. That
> could include making it easier for the secondary namenode to push out
> updates to worker nodes in the system that can store timestamped/version
> stamped copies of the state; it could be improving recovery of state, and it
> could be better code to make sure that the secondary Namenode is actually
> working. Because you will need a secondary namenode on any cluster of
> moderate size, and you will need to make sure it is working -and test it-
>
>
> On 28/09/11 14:27, Ravi Prakash wrote:
>
>> Hi Mirko,
>>
>> Its seems like a great idea to me!! The architects and senior developers
>> might have some more insight on this though.
>>
>> I think part of the reason why the community might be lazy about
>> implementing this is because the Namenode being a single point of failure
>> is
>> usually regarded as FUD. There are simple tricks (like writing the fsimage
>> and editslog to NFS) which can guard against some failure scenarios, and I
>> think most users of hadoop are satisfied with that.
>>
>> I wouldn't be too surprised if there is already a JIRA for this. But if
>> you
>> could come up with a patch, I'm hopeful the community would be interested
>> in
>> it.
>>
>> Cheers
>> Ravi
>>
>> 2011/9/27 Mirko 
>> Kämpf<mirko.kaempf@googlemail.**com<mirko.kae...@googlemail.com>
>> >
>>
>>  Hi,
>>> during the Cloudera Developer Training at Berlin I came up with an idea,
>>> regarding a lost name-node.
>>> As in this case all data blocks are lost. The solution could be, to have
>>> a
>>> table which relates filenames and block_ids on that node, which can be
>>> scaned
>>> after a name-node is lost. Or on every block could be a kind of a
>>> backlink
>>> to the filename and the total nr of blocks and/or a total hashsum
>>> attached.
>>> This would it make easy to recover with minimal overhead.
>>>
>>> Now I would like to ask the developer community, if there is any good
>>> reason
>>> not to do this?
>>> Before I start to figure out where to start an implementation of such a
>>> feature.
>>>
>>> Thanks,
>>> Mirko
>>>
>>>
>>
>

Reply via email to