Simplest case, if you need a sum of the lines for A,B, and C is to look at the output that is normally generated which tells you "Reduce output records". This can be accessed like the others are telling you, as a counter, which you could access and explicitly print out or with your eyes as the summary of the job when it is done.
Cheers James. On Tue, Mar 8, 2011 at 3:29 AM, Harsh J <[email protected]> wrote: > I think the previous reply wasn't very accurate. So you need a count > per-file? One way I can think of doing that, via the job itself, is to > use Counter to count the "name of the output + the task's ID". But it > would not be a good solution if there are several hundreds of tasks. > > A distributed count can be performed on a single file, however, using > an identity mapper + null output and then looking at map-input-records > counter after completion. > > On Tue, Mar 8, 2011 at 3:54 PM, Harsh J <[email protected]> wrote: >> Count them as you sink using the Counters functionality of Hadoop >> Map/Reduce (If you're using MultipleOutputs, it has a way to enable >> counters for each name used). You can then aggregate related counters >> post-job, if needed. >> >> On Tue, Mar 8, 2011 at 3:11 PM, Jun Young Kim <[email protected]> wrote: >>> Hi. >>> >>> my hadoop application generated several output files by a single job. >>> (for example, A, B, C are generated as a result) >>> >>> after finishing a job, I want to count each files' row counts. >>> >>> is there any way to count each files? >>> >>> thanks. >>> >>> -- >>> Junyoung Kim ([email protected]) >>> >>> >> >> >> >> -- >> Harsh J >> www.harshj.com >> > > > > -- > Harsh J > www.harshj.com >
