Re: Hadoop evaluation: Reliability issues and NFS

Doug Cutting Fri, 31 Mar 2006 13:11:55 -0800

Dan Bretherton wrote:

We do not need to use the distributed filesystem in Hadoop becausethe data and home directories are available on every machine via NFS.

Writing everything over NFS will seriously affect Hadoop's performance,and is not recommended.

060330 130100 task_m_f8jt6q  SEVERE FSError from child
060330 130100 task_m_f8jt6q org.apache.hadoop.fs.FSError: java.io.IOException:Stale NFS file handle


This looks related to your use of NFS.

java.io.IOException: Task process exit with nonzero status. atorg.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273) at ...etc.

That means the JVM running your task crashed. Enabling core dumps mighthelp you figure out why it crashed.

java.io.FileNotFoundException: /users/dab/Hadoop/input/Occam/.nfs000047ab00000001at org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:114)at ....etc.


This looks like another NFS-related problem.

though thiswas not a true test of the DFS because of our NFS setup (i.e. all the DFSblocks actually end up in my home directory on a single disk). I should alsopoint out that the input data involved in the DFS was just a list of filenames, not the temperature data itself. Using the DFS I found that the jobsoften failed because of problems with missing blocks of data. Here is atypical error message from the job tracker log file.
java.io.IOException: Could not obtain block blk_-3035035931951255964 atorg.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:362)at ...etc.
As soon as these errors start to appear it means that the DFS is broken.

This could be related to running DFS on top of NFS. Again, I would notrecommend that.

Generally I would try running things without using NFS, using localvolumes for all mapred and dfs directories. That is the intended use.


Doug

Re: Hadoop evaluation: Reliability issues and NFS

Reply via email to