Robert, I will take a shot at it. I think it would be about writing a custom comparator and a partitioner, reading some config parameters and sending the counters as key/value pairs to the reducers. It shouldn't be that difficult.
If I am stuck, I will post in the forum. I will also know how to create a patch. Regards, Praveen On Thu, Dec 8, 2011 at 9:45 PM, Robert Evans <ev...@yahoo-inc.com> wrote: > Sorry I have not responded sooner I have had a number of fires at work > to put out, and I haven’t been keeping up with the user mailing lists. The > code I did before was very specific to the task I was working on, and it > was an ugly hack because I did not bother with the comparator, I already > knew there was only a small predefined set of keys so I just output one set > of metadata data for each key. > > I would be happy to put something like this into the map/reduce framework. > I have filed https://issues.apache.org/jira/browse/MAPREDUCE-3520 for > this. I just don’t know when I will have the time to do that, especially > with my work on the 0.23 release. I’ll also talk to my management to see > if they want to allow me to work on this during work, or if it will have to > be in my spare time. Please feel free to comment on the JIRA or vote for > it if you feel that it is something that you want done. Or if you feel > comfortable helping out perhaps you could take a first crack at it. > > Thanks, > > Bobby Evans > > > On 12/6/11 9:14 AM, "Mapred Learn" <mapred.le...@gmail.com> wrote: > > Hi Praveen, > Could you share here so that we can use ? > > Thanks, > > Sent from my iPhone > > On Dec 6, 2011, at 6:29 AM, Praveen Sripati <praveensrip...@gmail.com> > wrote: > > Robert, > > > I have made the above thing work. > > Any plans to make it into the Hadoop framework. There had been similar > queries about it in other forums also. Need any help testing/documenting or > anything, please let me know. > > Regards, > Praveen > > On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans <ev...@yahoo-inc.com < > mailto:ev...@yahoo-inc.com <ev...@yahoo-inc.com>> > wrote: > > Anurag, > > The current set of counter APIs from within Map or Reduce process are > write only. They are not intended to be used for reading data from other > tasks. They are there to be used for collecting statistics about the job > as a whole. If you use too many of them the performance of the system as a > whole can get very bad, because they are stored on the JobTracker in > memory. Also there is the potential that a map task that has finished > “successfully” can later fail if the node it is running on dies before all > of the map output can be fetched by all of the reducers. This could result > in a reducer reading in counter data that is only partial or out of date. > You may be able to access it through the job API but I would not > recommend it and I think there may be some issues with security if you have > security enabled, but I don’t know for sure. > > If you have an optimization that really needs summary data from each > mapper in all reducers then you should do it a map/reduce way. Output a > special key/value pair when a mapper finishes for each reducer with the > statistics in it. You can know how many reducers there are because that is > set in the configuration. You then need a special partitioner to recognize > those summary key/value pairs and make sure that they each go to the proper > reducer. You also need a special compairitor to make sure that these > special keys are the very first ones read by the reducer so it can have the > data before processing anything else. > > I would also recommend that you don’t try to store this data in HDFS. You > can very easily do a DDOS on the namenode on a large cluster, and then your > ops will yell at you as they did with me before I stopped doing it. I have > made the above thing work. It is just a lot of work to do it right. > > --Bobby Evans > > > > On 12/1/11 1:18 PM, "Markus Jelsma" <markus.jel...@openindex.io < > mailto:markus.jel...@openindex.io <markus.jel...@openindex.io>> > wrote: > > Can access it via the Job API? > > > http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29< > http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29> > > > > Hi, > > > I have a similar query. > > > > > > Infact, I sent it yesterday and waiting for anybody's response who might > > > have done it. > > > > > > > > > Thanks, > > > Anurag Tangri > > > > > > 2011/11/30 rabbit_cheng <rabbit_ch...@126.com < > mailto:rabbit_ch...@126.com <rabbit_ch...@126.com>> > > > > > > > > > I have created a counter in mapper to count something, I wanna get the > > > > > > > > counter's value in reducer phase, the code segment is as follow: > > > > > > > > public class MM extends Mapper<LongWritable, Text, Text, Text> { > > > > > > > > static enum TEST{ pt } > > > > @Override > > > > public void map(LongWritable key, Text values, Context context) > > > > throws > > > > > > > > IOException, InterruptedException { > > > > > > > > context.getCounter(TEST.pt <http://TEST.pt> ).increment(1); > > > > > > > > > } > > > > > > > > } > > > > public class KMeansReducer extends Reducer<Text, Text, Text, Text> { > > > > > > > > @Override > > > > protected void setup(Context context) throws IOException, > > > > > > > > InterruptedException { > > > > > > > > long ptValue=context.getCounter(MM.TEST.pt <http://MM.TEST.pt> > <http://MM.TEST.pt> <http://mm.test.pt/ <http://mm.test.pt/> > > > > > > > > > > ).getValue(); > > > > > > > > } > > > > > > > > } > > > > but what I get is always 0, i.e., the value of variable ptValue is > always > > > > 0. > > > > Does anybody know how to access a mapper counter in reducer? > > > >