Thanks a lot Harsh. I am sure that it would be a logical issue on Reducer. when reducer=1 it works as expected. But counter output also gives expected result irrespective of num of reducer. Here are the counter ouput: 12/01/24 17:02:16 INFO mapred.JobClient: NUM_RECORDS=66 12/01/24 17:02:16 INFO mapred.JobClient: Job Counters 12/01/24 17:02:16 INFO mapred.JobClient: Launched reduce tasks=4 12/01/24 17:02:16 INFO mapred.JobClient: Launched map tasks=2 12/01/24 17:02:16 INFO mapred.JobClient: Data-local map tasks=2 12/01/24 17:02:16 INFO mapred.JobClient: FileSystemCounters 12/01/24 17:02:16 INFO mapred.JobClient: FILE_BYTES_READ=1028 12/01/24 17:02:16 INFO mapred.JobClient: HDFS_BYTES_READ=984 12/01/24 17:02:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2288 12/01/24 17:02:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=5139 12/01/24 17:02:16 INFO mapred.JobClient: Map-Reduce Framework 12/01/24 17:02:16 INFO mapred.JobClient: Reduce input groups=6 12/01/24 17:02:16 INFO mapred.JobClient: Combine output records=0 12/01/24 17:02:16 INFO mapred.JobClient: Map input records=6 12/01/24 17:02:16 INFO mapred.JobClient: Reduce shuffle bytes=873 12/01/24 17:02:16 INFO mapred.JobClient: Reduce output records=66 12/01/24 17:02:16 INFO mapred.JobClient: Spilled Records=12 12/01/24 17:02:16 INFO mapred.JobClient: Map output bytes=992 12/01/24 17:02:16 INFO mapred.JobClient: Map input bytes=794 12/01/24 17:02:16 INFO mapred.JobClient: Combine input records=0 12/01/24 17:02:16 INFO mapred.JobClient: Map output records=6 12/01/24 17:02:16 INFO mapred.JobClient: Reduce input records=6
It says Reduce input records=6 & Reduce output records=66, but there are actually 22 output records on reducer. I use custome output format, public class CustomMultipleTextOutputFormat<K, V> extends MultipleTextOutputFormat<K, V> { @Override protected String generateFileNameForKeyValue(K key, V value, String name) { String[] keys = key.toString().split("%"); if ( keys.length != 3 ) { return key.toString(); } return keys[2].toString(); } } I am not sure what am I missing? Any suggestion would be appreciated. Thanks, Tamil On Sun, Jan 22, 2012 at 1:24 AM, Harsh J <ha...@cloudera.com> wrote: > The only difference would be that with 4 reducers your keys would get > partitioned based on their hashCode() implementation (if you use the > default hash partitioner) (I'd check the key impl. here, first thing, if > its a custom key impl.), and each be sent to one reducer. > > Check the input record counters on your reducers, and the total map output > record counters - they should add up and be equal to the latter. Also make > sure you aren't skipping out on the reducer iterator under any condition, > when you are doing the reducer op. > > I'm guessing its mostly your logic that's somehow causing this but I do > not have your source bits to say that for sure. > > On 21-Jan-2012, at 11:47 PM, Thamizhannal Paramasivam wrote: > > Hi All, > I am experimenting MapReduce program on Hadoop-0.19. This program has > single input file with 7 records(later it can have many records on multiple > files) and each input suppose to produce 11 output records. When it runs > with no_of_reducer=4. It produces only 33 records. But, when I ran with > no_of_reducer=1 then it produces 77 records as expected. > > What could be the reason for this ? I am missing any configuration > parameter. > > Thanks > Tamil > > > -- > Harsh J > Customer Ops. Engineer, Cloudera > >