Re: How to deal with HDFS failures properly

Ariel Rabkin Sun, 01 Mar 2009 21:48:35 -0800

DataNode failures should be transparent.  NameNode failures will bring
down the whole HDFS and result in noticeable outage.   Replicating the
NameNode is on the long-term roadmap, but my impression is that it
won't be happening very soon.


--Ari

On Thu, Feb 26, 2009 at 5:30 PM, Brian Long <[email protected]> wrote:
> I'm wondering what the proper actions to take in light of a NameNode or
> DataNode failure are in an application which is holding a reference to a
> FileSystem object.
> * Does the FileSystem handle all of this itself (e.g. reconnect logic)?
> * Do I need to get a new FileSystem using .get(Configuration)?
> * Does the FileSystem need to be closed before re-getting?
> * Do the answers to these questions depend on whether it's a NameNode or
> DataNode that's failed?
>
> In short, how does an application (not a Hadoop job -- just an app using
> HDFS) properly recover from a NameNode or DataNode failure? I haven't
> figured out the magic juju yet and my applications are not handling DFS
> outages gracefully.
>
> Thanks,
> Brian
>



-- 
Ari Rabkin [email protected]
UC Berkeley Computer Science Department

Re: How to deal with HDFS failures properly

Reply via email to