Hi there,

I've got a simple Map Reduce application that works perfectly when I use NFS as an underlying filesystem (not using HDFS at all). I've got a working HDFS configuration as well - grep example works for me with this configuration.

However, when I try to run the same application on HDFS instead of NFS I keep recieving "IOException: Filesystem closed." exception and the job fails. I've spent a day searching for a solution with Google and scanning thru old archieves but no results so far...

Job summary is:
--->output
10/05/26 17:29:13 INFO mapred.JobClient: Job complete: job_201005261710_0002
10/05/26 17:29:13 INFO mapred.JobClient: Counters: 4
10/05/26 17:29:13 INFO mapred.JobClient:   Job Counters
10/05/26 17:29:13 INFO mapred.JobClient:     Rack-local map tasks=12
10/05/26 17:29:13 INFO mapred.JobClient:     Launched map tasks=16
10/05/26 17:29:13 INFO mapred.JobClient:     Data-local map tasks=4
10/05/26 17:29:13 INFO mapred.JobClient:     Failed map tasks=1

Each map task's attempt log reads somehow like:
--->attempt_201005261710_0001_m_000000_3/syslog:
2010-05-26 17:13:47,297 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, session
Id=
2010-05-26 17:13:47,470 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 2010-05-26 17:13:47,688 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720 2010-05-26 17:13:47,688 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680 2010-05-26 17:13:47,712 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output 2010-05-26 17:13:47,784 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0 2010-05-26 17:13:47,788 INFO org.apache.hadoop.mapred.TaskRunner: Task:attempt_201005261710_0001_m_000000_3 is done. And is i
n the process of commiting
2010-05-26 17:13:47,797 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
*java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:226)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:617)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.needsTaskCommit(FileOutputCommitter.java:217)
        at org.apache.hadoop.mapred.Task.done(Task.java:671)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:309)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)*
2010-05-26 17:13:47,802 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task 2010-05-26 17:13:47,802 WARN org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Error discarding output*
java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:226)
        at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:580)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:227) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.abortTask(FileOutputCommitter.java:179)
        at org.apache.hadoop.mapred.Task.taskCleanup(Task.java:815)
        at org.apache.hadoop.mapred.Child.main(Child.java:191)*

There are no reduce task run, as map tasks haven't managed to save their solution.

This exceptions are visible in JobTracker's log as well. What is the reason for this excpetion? Is it critical (I guess it is, but it's listed in JobTracker's log as INFO not ERROR).

My config (I'm not sure which directories should be local and which located on HDFS, maybe the issue is somewhere here?):

---->core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://blade02:5432/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop/tmp</value> <!-- local -->
</property>

</configuration>

---->hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/tmp/hadoop/name2</value> <!-- local dir where HDFS is located-->
</property>
<property>
<name>dfs.data.dir</name>
<value>/tmp/hadoop/data</value> <!-- local dir where HDFS is located -->
</property>
</configuration>

---->mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>blade02:5435</value>
</property>
<property>
<name>mapred.temp.dir</name>
<value>mapred_tmp</value> <!-- on HDFS I suppose -->
</property>
<property>
<name>mapred.system.dir</name>
<value>system</value> <!-- on HDFS I suppose -->
</property>
<property>
<name>mapred.local.dir</name>
<value>/tmp/hadoop/local</value> <!-- local -->
</property>
<property>
<name>mapred.task.tracker.http.address</name>
<value>0.0.0.0:0</value>
</property>
<property>
<name>mapred.textoutputformat.separator</name>
<value>,</value>
</property>
</configuration>

I'm using Hadoop 0.20.2 (new API -> org.apache.hadoop.mapreduce.*, default OutputFormat and RecordWriter), running on a 3-node cluster (blade02, blade03, blade04). blade02 is a master, all of them are slaves. My OS: Linux blade02 2.6.9-42.0.2.ELsmp #1 SMP Tue Aug 22 17:26:55 CDT 2006 i686 i686 i386 GNU/Linux.

Note that there are currently 3 filesystems in my configuration:
/tmp/* - is a local fs for each processor
/home/* - as the NFS common for all processors - this is where the hadoop is installed
hdfs://blade02:5432/* - HDFS

I'm not sure if this is relevant, but intermediate (key, value) pair is of type (Text, TermVector), and TermVector Writable methods are implemented like this:
        public class TermVector implements Writable {
private Map<Text, IntWritable> vec = new HashMap<Text, IntWritable>();

                @Override
                public void write(DataOutput out) throws IOException {
                        out.writeInt(vec.size());
for (Map.Entry<Text, IntWritable> e : vec.entrySet()) {
                                e.getKey().write(out);
                                e.getValue().write(out);
                        }
                }

                @Override
                public void readFields(DataInput in) throws IOException {
                        int n = in.readInt();
                        for (int i = 0; i < n; ++i) {
                                Text t = new Text();
                                t.readFields(in);
                                IntWritable iw = new IntWritable();
                                iw.readFields(in);
                                vec.put(t, iw);
                        }
                }
    ...
    }

Any help appreciated.

Many thanks,
Marcin Sieniek

Reply via email to