Hi All, I am writing a sequence file to HDFS from an application as a pre-process to a MapReduce job. (It isn't being written from a MR job, just open, write, close)
The file is around 32 MBs in size. When the MapReduce job starts up, it starts with 256 map tasks. I am writing SequenceFiles from this first job and firing up a second with the first job's output. The second job has around 32KB of input with 138 map tasks. There are 128 part files, so it should only be 128 map tasks for this second job. This seems to be an unusually large amount of map tasks since the cluster is configured to the default block size of 64MB. I am using Hadoop v0.20.1. Is there something special about how the SequenceFiles are being written? As far as how I am using to write the first file, below is a code sample. Thanks, Adam FileSystem fs = FileSystem.get(new Configuration()); Writer wrtr = SequenceFile.createWriter(fs, fs.getConf(), <path_to_file>, Text.class, Text.class); for (String s1 : strings1) { for (String s2 : strings2) { wrtr.append((new Text(s1), new Text(s2)); } } wrtr.close();