When I read your requirement carefully (interpreting from your example input/output), I am not sure of what you exactly require. You want to sort based on what? The urls, or, the count of them (the solution I had pointed out earlier), or, both in some combination?
-----Original Message----- From: Devaraj Das [mailto:[EMAIL PROTECTED] Sent: Monday, July 02, 2007 10:23 AM To: [email protected] Subject: RE: DecreasingComparator Ignore my response.. I read your mail wrongly and assumed that you wanted to sort by the decreasing order of counts of the url words. -----Original Message----- From: Devaraj Das [mailto:[EMAIL PROTECTED] Sent: Monday, July 02, 2007 10:09 AM To: [email protected] Subject: RE: DecreasingComparator Your first MapReduce phase is very similar to the WordCount example. The only difference is that you need to create LongWritable objects for the values. The output format should be SequenceFileOutputFormat.class. Run a subsequent MapReduce phase with the input format set to SequenceFileInputFormat.class, the map class set to InverseMapping.class, and, the OutputKeyComparator set to LongWritable.DecreasingComparator.class. By the way, the 2nd mapreduce phase won't work unless you patch your version of hadoop with https://issues.apache.org/jira/secure/attachment/12360717/1535_01.patch . This hasn't been committed yet. -----Original Message----- From: Peter W. [mailto:[EMAIL PROTECTED] Sent: Monday, July 02, 2007 6:08 AM To: [email protected] Subject: DecreasingComparator Hello, I have a modified WordCount program with the following characteristics: input file: urla.com,urlb.com urla.com,urlc.com urlb.com,urlc.com urlc.com,urla.com urld.com,urlc.com mapreduce output: urla.com 3 urlb.com 2 urlc.com 4 urld.com 1 Next, tried using a comparator with a different JobConf and mapreduce: jc.setOutputKeyComparatorClass(LongWritable.DecreasingComparator.class); but it didn't work because the values are IntWritable and my OutputCollector wasn't picking up the right things... What do I need to collect in both the map and reduce for the final result to sort descending high-low? Thanks, Peter W.
