I used to declare various counters in my map/reduce classes to keep track various statistics of my jobs. Typically, those counters are initialized by the configure method, updated by the map/reduce methods, and finalized by the close method. However, those counters are per task basis, I don't have a good way to aggregate those counters across an entire job. It would be nice to introduce some API to let each map/reduce task initialize their own counters and register them with the job. At the end of the job, the job tracker can automatically aggregate them and made them available through API or through the job status.
Runping > -----Original Message----- > From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 02, 2007 4:37 AM > To: [email protected] > Subject: Per-job counters > > Hi, > > I'm trying to figure out how to implement per-job counters. Google's > paper on map-reduce mentions that their API allows individual tasks to > update global counters, defined for each job, and then easily retrieve > them when the job is completed. > > Example: process some records in a map-reduce job (with many map and > reduce taks), and at the end of the job emit the total count of > processed records for the whole job (or any other programmer-defined > count aggregated during processing). > > I was looking at the metrics API, but it's not obvious to me if it's > useful in this case ... if so, how should I go about it? > > I could probably implement extended OutputFormat-s that write down these > counters per each task to a separate output file, and then read them at > the end of the job, but this seems awfully intrusive and complex for > such a simple functionality... > > I'd appreciate any suggestions. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com >
