DataNode failures should be transparent. NameNode failures will bring down the whole HDFS and result in noticeable outage. Replicating the NameNode is on the long-term roadmap, but my impression is that it won't be happening very soon.
--Ari On Thu, Feb 26, 2009 at 5:30 PM, Brian Long <[email protected]> wrote: > I'm wondering what the proper actions to take in light of a NameNode or > DataNode failure are in an application which is holding a reference to a > FileSystem object. > * Does the FileSystem handle all of this itself (e.g. reconnect logic)? > * Do I need to get a new FileSystem using .get(Configuration)? > * Does the FileSystem need to be closed before re-getting? > * Do the answers to these questions depend on whether it's a NameNode or > DataNode that's failed? > > In short, how does an application (not a Hadoop job -- just an app using > HDFS) properly recover from a NameNode or DataNode failure? I haven't > figured out the magic juju yet and my applications are not handling DFS > outages gracefully. > > Thanks, > Brian > -- Ari Rabkin [email protected] UC Berkeley Computer Science Department
