Unusual large number of map tasks for a SequenceFile

Adam Shook Mon, 01 Aug 2011 14:24:08 -0700

Hi All,

I am writing a sequence file to HDFS from an application as a pre-process to a 
MapReduce job.  (It isn't being written from a MR job, just open, write, close)


The file is around 32 MBs in size.  When the MapReduce job starts up, it starts 
with 256 map tasks.  I am writing SequenceFiles from this first job and firing 
up a second with the first job's output.  The second job has around 32KB of 
input with 138 map tasks.  There are 128 part files, so it should only be 128 
map tasks for this second job.  This seems to be an unusually large amount of 
map tasks since the cluster is configured to the default block size of 64MB.  I 
am using Hadoop v0.20.1.

Is there something special about how the SequenceFiles are being written?  As 
far as how I am using to write the first file, below is a code sample.

Thanks,
Adam


FileSystem fs = FileSystem.get(new Configuration());
Writer wrtr = SequenceFile.createWriter(fs, fs.getConf(), <path_to_file>, 
Text.class, Text.class);

for (String s1 : strings1) {
      for (String s2 : strings2) {
wrtr.append((new Text(s1), new Text(s2));
}
}

wrtr.close();

Unusual large number of map tasks for a SequenceFile

Reply via email to